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The  Elicited  Probability  approach  is  based  on  a  transition  matrix  which 
relates  the  current  state  vector  to  a  set  of  state  transformation  operators. 

The  matrix  elements  are  conditional  probabilities  elicited  from  experts  (or  can 
be  determined  by  collecting  statistics).  The  state  transformation  operators  are 
rules  which  dynamically  change  the  state  of  the  simulation  when  selected  by  the 
application  of  Bayesian  algorithms.  The  basic  mechanism  can  be  used  to  select 
operators  in  a  hierarchic  manner  by  incorporating  them  in  higher  level  trans¬ 
formation  operators. 

The  Adaptive  Decision  approach  uses  pattern  recognition  to  learn  opponent 
behavior  from  Instructor  opponent  controllers  (operator).  This  approach  is 
based  as  a  pattern  classifier  and  is  used  to  identify  biases  in  operator 
decision  policy  as  a  response  to  classes  or  patterns  in  the  input  data.  The 
Multi-Attribute  Utility  (MAU)  model  is  used  to  capture  the  decision  behavior  of 
the  operator.  In  the  MAU  model,  the  consequences  of  every  action  are  considered 
to  be  decomposable  according  to  a  single  common  set  of  attributes. 

The  Heuristic  Search  approach  provides  a  mechanism  by  which  the  opponent 
responds  to  actions  taken  by  friendly  forces  with  a  course  of  action  which  leads 
to  the  achievement  of  some  enemy  goal.  A  state  space  model  is  used  to  represent 
the  problem  domain.  The  states  are  a  complete  description  of  the  tactical 
situations  as  they  exist  at  a  particular  instant  of  time.  An  action  converts 
one  state  into  another.  The  opponent  asks  the  question,  "What  sequence  of 
actions  can  transform  the  current  state  into  a  desired  goal  state?"  The  basic 
search  algorithm  begins  at  a  start  node  and  expands  successive  nodes  until  a 
goal  node  is  encountered.  Then  the  path  from  the  initial  node  to  that  goal  node 
is  the  solution  sought.  Heuristic  Search  algorithms  use  domain  specific 
knowledge  to  guide  the  search.  Heuristic  knowledge  may  apply  to  node  expansion 
or  to  path  evaluation.  In  either  case  heuristic  knowledge  is  used  to  reduce  the 
searching  effort.  Specific  Heuristic  Search  algorithms  are  discussed. 

The  Production  Rules  approach  uses  sets  of  situation-action  pairs,  called 
"productions"  to  transform  the  current  state  to  the  next  state.  The  productions 
represent  the  problem  specific  knowledge.  In  addition  to  productions,  the 
Production  Rule  system  contains  a  triggering  mechanism  that  applies  those  that 
are  applicable-causing  the  situation  to  change.  AND/OR  graphs  represent  human 
reasoning  process,  and  can  be  used  to  answer  the  questions  of  how  or  wi\y  a 
particular  conclusion  was  reached  by  the  system.  Also,  the  user  can  hypothesize 
a  conclusion  or  desired  final  state  and  use  the  productions  to  work  backward 
toward  an  enumeration  of  the  facts  that  would  support  the  hypothesis. 

A 

A  set  of  attributes  for  rating  each  approach  are  defined  and  described. 

The  attributes  are  in  three  general  categories.  Attributes  related  to  the 
modeling  capability  of  the  approach,  those  related  to  the  development  required 
to  use  the  approach  in  a  sub  simulation,  and  those  that  relate  to  the  expected 
performance  of  a  simulation  system  based  on  a  given  approach.  These  attributes 
are  then  used  to  rate  each  approach.  Finally,  several  representative  decisions 
are  discussed  and  the  method  of  application  for  each  approach  described. 
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PREFACE 

Simulators  are  employed  to  train  military  personnel  in  a  wide  range 
of  combat -re la  ted  skills,  from  the  performance  of  simple  procedural  tasks 
to  the  execution  of  complex  interactive  missions.  A  primary  design  goal 
in  the  specification  of  simulator  equipment  is  a  sufficient  degree  of 
functional  fidelity  to  allow  a  high  degree  of  transfer  of  training  to 
manifest  itself  in  the  later  performance  of  the  operational  task. 

For  the  training  of  simpler,  procedural  tasks  an  acceptable  level  of 
fidelity  can  be  achieved  by  creating  a  simulation  of  the  operational 
equipment.  However,  when  tasks  with  a  high  cognitive  component  are 
simulated,  such  as  those  associated  with  tactical  performance,  it  becomes 
necessary  to  simulate  the  external  environment  under  which  the  operational 
mission  is  carried  out. 


In  the  context  of  tactics  training,  the  most  important  aspect  of  the 
combat  environment  is  the  adversary.  Current  tactics  simulators,  such  as 
(he  Submarine  Combat  Systems  Trainers  (21A37  series),  have  an  adversary 
which  is  controlled  by  an  instructor  during  training  exercises.  This 
approach  has  several  shortcomings,  among  them:  1)  the  instructor  is  a 
valuable  resource  who  should  be  used  more  effectively  in  other  functions, 
such  as  monitoring  the  performance  of  the  trainees,  2)  the  tactical 
abilities  of  instructors  vary  widely,  3)  it  is  very  difficult  for  an 
instructor  to  maneuver  multiple  adversaries,  and  4)  since  the  instructor 
has  the  advantage  of  knowing  exactly  what  own  ship  is  doing,  it  is 
difficult  for  him  to  maneuver  the  target(s)  in  a  realistic  fashion. 


One  approach  to  unburdening  the  instructor  and,  at  the  same  time, 
creating  adversary  targets  with  a  higher  degree  of  fidelity  lies  in  auto¬ 
mating  the  maneuvering  of  the  targets.  The  computer  modeling  of  physical 
systems  is  a  cornerstone  of  training  simulation.  Many  of  the  same 
techniques  can  be  applied  to  modeling  an  adversary.  However,  the  modeling 
of  intelligent  behavior  appears  to  be  a  much  more  complex  problem. 


The  objective  of  the  current  study  was  to  survey  a  spectrum  of 
modeling  techniques  and  isolate  several  candidates  which  could  be  applied 
to  the  problem.  These  candidate  techniques  were  then  further  analyzed 
and  evaluated  against  certain  training  criteria.  Recommendations  are 
made  concerning  each  modeling  approach. 

•  HhhL  :: 


Robert  Ah  let's 
Scientific  Officer 
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SECTION  I 
INTRODUCTION 


OBJECTIVES 

This  final  report  provides  a  presentation  and  evaluation  of  several 
alternative  models  potentially  useful  as  an  intelligent  opponent  model. 
These  models  are  intended  to  be  used  to  simulate  realistically  the  tacti¬ 
cal  behavior  of  enemy  submarines  within  the  Navy  Submarine  Combat  System 
Trainers  (SCST). 

The  objectives  of  the  program  are  to: 

a.  Analyze  the  requirements  of  Navy  submarine  tactical  trainers 
with  respect  to  the  tactical  behavior  of  simulated  enemy  submarines. 

b.  Identify  the  knowledgeable  opponent  model  algorithms  and  tech¬ 
niques  applicable  to  submarine  tactics. 

c.  Evaluate  each  model  to  assess  its  tactical  maneuvering  capabil¬ 
ities,  trainability,  software  requirements,  trainee  performance  measure¬ 
ment,  and  required  research  and  development. 

This  report  covers  all  these  three  objectives  and  specifically 
includes,  with  minor  changes,  the  two  quarterly  reports  that  cover 
objectives  (a)  and  (b).  It  goes  beyond  these  reports  in  providing  a 
detailed  evaluation  of  each  model,  a  compatibility  analysis  of  each  model 
for  some  of  the  specific  decision  tasks  needed  in  the  submarine  combat 
mission,  and  a  recommendation  for  an  overall,  best  model. 

BACKGROUND 

Current  Navy  submarine  tactical  simulators  provide  enemy  submarine 
maneuver  capability  in  the  form  of  either  (1)  pre-determined  maneuver 
patterns  or  (2)  controlled  tactics  performed  by  human  operators.  These 
forms  of  tactical  control  are  inadequate  for  modern  Naval  training 
objectives.  "Canned"  maneuver  patterns  are  not  responsive  to  friendly 
submarine  tactics  performed  by  the  student  trainee  and  present  an  unreal¬ 
istic  environment.  Further,  the  student  may  learn  the  pre-determined 
enemy  tactical  patterns  with  continued  simulator  experience,  thus 
invalidating  performance  measures.  On  the  other  hand,  the  human  control¬ 
ler's  main  function  is  to  monitor  the  trainee  and  evaluate  his  performance. 
This  function  permits  little  time  to  maneuver  enemy  submarines  in  response 
to  the  trainees'  tactics.  The  problem  is  compounded  when  multiple 
targets  are  involved.  Asssigning  a  full-time  controller  to  each  target 
is  prohibitively  expensive  in  terms  of  manpower  requirements.  Further, 
the  target  behavior  resulting  from  a  human  controller  will  not  exhibit 
the  consistency  necessary  to  train  students  on  all  types  of  tactical 
maneuvers  he  may  encounter. 
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A  computer-driven  "knowledgeable  opponent"  submarine  model  will 
alleviate  many  of  the  problems  inherent  in  pre-determined  or  human- 
driven  models  in  the  following  ways  (Ahlers.  1978): 

a.  Provides  Action  Feedback  for  Trainee1 s  Inputs.  The  trainee 
will  receive  "opera tionalTy  vatid’^Teedback  rather  than  abstract  perform¬ 
ance  measures  which  are  not  presented  in  real  time.  The  feedback  in  the 
form  of  target  responses,  will  be  displayed  on  the  trainee's  primary 
display.  Thus,  no  time-sharing  between  task  and  performance  displays 
would  be  necessary;  full  attention  could  be  directed  to  the  task  display. 

b.  Provides  an  Optimum  Model  for  the  Trainee  to  Emulate.  This  is 
particular!^ important  for  individualized  instruction  as  it  allows  the 
trainee  to  “discover"  effective  tactics. 

c.  Provides  Infinite  Variety  of  Tactical  Configurations.  Since  the 
target  will  be  responsive  to  the  trainee's  tactics  and  wTI  1  be  maneuvered 
differently  as  learning  takes  place,  broad  experience  in  unique  situations 
will  be  provided. 

d.  Provides  an  Equally  Matched  Opponent  at  any  Level  of  Trainee's 
Expertise.  By  varying  the  responsiveness  and  the  appropriateness  of  its 
maneuvers,  the  target  can  be  modified  to  remain  challenging,  but  beatable, 
for  a  trainee  at  any  level  of  proficiency.  The  complexity  of  the  target 
could  range  from  a  straight-running  target,  for  use  in  early  training,  to 
a  highly  sophisticated  opponent  with  optimum  sensor  information  for  use 
with  highly  experienced  approach  officers. 

e .  Enhances  Intrinsic  Motivat i onal  Properties  of  the  Training  Task. 
Training  scenarios  will  become  true  "one-on-one"  contests,  ancT  the 
possibility  of  defeat  will  encourage  the  trainee  to  attend  to  the  task 
and  maintain  interest  in  it. 

f.  Enhances  Evaluation  of  the  Trainee's  Mastery  of  the  Task. 

Certain  aspects  of  the  knowledgeable  opponent  model  may  be  exploited  to 
provide  measures  of  the  trainee's  performance.  For  example,  the  length 
of  time  the  opponent  maintains  a  tactical  advantage  is  expected  to 
decrease  as  the  trainee  gains  tactical  knowledge  and  experience. 

g.  Allows  Training  Exorcises  to  Reach  a  Legitimate  Conclusion. 

The  knowledgeable  opponent  will  win  when  it  achieves  a  significant  tact¬ 
ical  advantage.  A  "canned"  target  cannot  win,  it  can  only  lose. 
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SECTION  II 


DECISION  ENVIRONMENT 

Before  requirements  for  a  knowledgeable  opponent  model  can  be  identi¬ 
fied,  the  decision  environment  for  the  model  must  be  established.  Since 
the  opponent  model  represents  the  rational  actions  of  an  enemy  submarine 
commanding  officer  (CO),  a  general  description  of  his  thought  processes 
and  decision  options  is  necessary.  Figure  1  shows,  in  flowchart  form, 
some  of  the  major  decisions  that  an  opposing  submarine  commanding  officer 
must  consider.  This  flowchart  was  obtained  through  the  cooperation  of 
the  tactical  instructors  at  the  SCST  facility  in  San  Diego,  California. 

The  first  contact  a  submarine  has  with  a  possible  enemy  submarine 
is  via  acoustic  sensors.  These  sensors  are  "passive"  since  they  only 
listen  for  sounds  and  emit  no  signals  of  their  own.  "Active"  sensors 
(sonar)  emit  signals  and  listen  for  their  echos.  When  a  sound  source  is 
determined  to  be  a  possible  enemy  submarine,  a  decision  must  be  made  as 
to  its  threat.  If  it  is  determined  to  be  threatening  due  to  its  location, 
a  decision  is  made  to  evade  counter-detection,  or  to  close  and  investigate 
with  the  possibility  of  attacking. 

Once  the  distance  between  the  submarines  is  close,  it  is  very  likely 
that  the  enemy  has  counter-detected,  and  therefore,  active  sensors  may 
be  used  for  more  accurate  information.  Such  sensors  are  not  used  early 
since  this  would  immediately  alert  the  opposing  submarine.  Active  sensors 
are  available  in  various  types,  and  the  specific  one  chosen  depends  on 
factors  such  as  ocean  temperature,  currents,  range,  etc. 

If  the  new  information  confirms  the  presence  of  a  submarine,  tactical 
maneuvers  begin.  These  maneuvers  are  to:  (1)  track  the  opposing  sub¬ 
marine's  movements,  (2)  position  the  possible  attack,  and  (3)  prepare  to 
evade  or  escape  enemy  attack,  if  necessary.  If  there  is  no  war  in 
progress,  only  tracking  is  considered.  However,  if  a  wartime  situation 
exists,  a  weapon  (torpedo)  is  launched  when  the  range  is  sufficiently 
close.  After  the  launching  of  a  weapon,  the  submarine  commander  must 
decide  whether  to  evade  a  possible  counter-attack  or,  if  the  attack  was 
unsuccessful,  to  attack  again. 

The  types  of  situations  described  above  are  typical  of  the  high- 
level  decisions  a  submarine  commander  must  make.  Therefore,  a  knowledge¬ 
able  opponent  model  should  be  able  to  choose  among  similar  types  of 
alternatives  at  the  proper  times.  These  include  not  only  decisions 
concerning  strategy  such  as  evasion,  sensor  resources,  attack  methods, 
weapons  choice,  etc.,  but  also  tactical  maneuvers  involving  course, 
speed,  depth,  etc. 
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SECTION  Ill 
REQUIREMENTS  ANALYSIS 

GENERAL 

Requirements  of  the  model  refer  to  those  model  characteristics 
associated  with  the  training  objectives,  training  facility,  and  submarine 
behavior  which  are  necessary  for  realistic  training  exercises.  The 
opponent  model  should  be  compatible  with  the  following  requirements. 

MODEL  STRATEGIES 

General  submarine  strategies  employed  by  the  model  will  be  deter¬ 
mined  by  the  instructional  objectives.  The  following  are  three  typical 
training  objectives  that  would  warrant  different  strategies: 

a.  Battle  Stations.  A  wartime  encounter  between  the  friendly  sub¬ 
marine  and  one  or  more  hostile  opponent  submarines  where  torpedo  attack 
is  possible. 

b.  Surveillance.  A  wartime  encounter  between  the  friendly  submarine 
and  one  or  more  hostile  opponent  submarines  where  information  gathering, 
and  not  attacking,  is  the  mission. 

c.  KILO.  A  peacetime  encounter  between  a  friendly  submarine  and  a 
non-hostile  opponent  submarine  where  observation  and  tracking  are  the 
primary  objectives. 

These  strategies  determine  the  general  behavioral  characteristics 
of  the  opponent  submarine  which  will  govern  and  control  the  manuever 
tactics. 

PRE -CONTACT  TACTICS 

Pre-contact  tactics  are  determined  by  the  particular  engagement 
scenario  being  exercised.  Since  pre-contact  tactics  do  not  depend  on 
the  movement  or  responses  of  the  friendly  submarine,  they  can  be  pre¬ 
defined  according  to  established  and  accepted  tactical  doctrines  and 
practices.  Pre-contact  behavior  will  Include  tactics  implementing  the 
following  mission  activities: 

a.  Barrier  Patrol  Search.  This  mission  is  a  submarine  search  pat¬ 
tern  along  barriers  such  as  coastlines,  shipping  lanes,  known  submarine 
routes,  etc. 

b.  Broad-Area  Patrol.  Patrolling  a  large  expanse  of  ocean  for 
enemy  submarines  requires  different  tactical  maneuvers,  as  well  as  speci¬ 
fic  sensor  types. 

c.  Choke-Point  Narrow  Pass.  Patrolling  a  narrow  undersea  pass 
demands  different  and  more  specialised  tactics  than  monitoring  a  broad 

area. 
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d.  Transient  Movement.  During  a  transient  movement  mission,  the 
submarine  is  assumed  to  be  traveling  from  one  location  to  another  for  some 
specific  purpose.  Pursuing  a  straight  line  course  is  not  the  best  way  to 
avoid  detection;  thus,  various  tactical  maneuvers  must  be  simulated. 

CONTACT  TACTICS 

Tactics  for  submarine  maneuvers  during  contact  with  enen\y  submarines 
must  be  compatible  with  existing  tactical  doctrine.  The  decisions  to  be 
made  at  each  point  are  the  course  (0°  -  J59"),  speed  (knots),  and  depth 
(feet).  The  objectives  which  determine  the  values  of  these  parameters 


a.  Manuevers  to  fix  the  location  of  the  friendly  submarine. 

b.  Maneuvers  to  gain  attack  position. 

c.  Maneuvers  to  evade  opponent  attack. 

d.  Maneuvers  to  evade  contact. 

The  tactics  doctrine  that  fulfills  the  above  objectives  can  be  found 
in  Navy  tactics  manuals. 

RELATED  DECISIONS 

Many  decisions  not  directly  connected  with  tactical  maneuvers  are 
vital  to  a  complete  model.  The  three  parameters  described  in  the  previous 
section  are  enough  to  specify  particular  maneuver  tactics.  However,  many 
other  related  decisions  must  be  made.  The  model  must  be  able  to  make  the 
following  decisions  at  the  proper  time  during  the  simulation.  The  model 
must  decide; 

a.  The  probability  of  a  contact  based  on  passive  sensors. 

b.  Whether  a  contact  represents  a  possible  thrqat. 

c.  Whether  to  approach  the  contact  or  evade. 

d.  Whether  to  stay  passive  or  use  active  sonar. 

e.  Which  weapon  to  fire  and  when. 

f.  Whether  or  not  the  submarine  is  within  the  weapon  range. 

g.  Whether  or  not  the  opponent  ship  has  fired  a  weapon. 

h.  Whether  or  not  to  use  decoys. 

1.  Whether  to  run  or  hide  in  deep  water  while  evading  contact. 
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FEATURES  NOT  INCLUDED 

The  knowledgeable  opponent  model  will  not  be  required  to  support  the 
following  simulation  features: 

a.  Surface  Ship  and  Periscope  Contact.  Since  almost  all  training 
exercises  deal  with  submarine-to-submarine  encounters,  the  SCST  instruc¬ 
tors  felt  that  simulation  of  either  surface  ship  contact  or  periscope 
contact  should  not  be  necessary. 

b.  Sonar  and  Acoustic  Equipment  Performance  Variations.  During 
simulation  exercises,  the  performance  of  the  acoustic  equipment  aboard 
the  friendly  submarines  is  sometimes  degraded  for  training  purposes.  It 
will  not  be  necessary  for  the  model  to  operate  under  such  conditions. 

d.  Multiple  Submarine  Strategies.  Since  the  radio  silence  will 
usually  be  maintained  between  submarines  during  wartime,  coordinated 
strategies  are  not  a  necessary  requirement.  Each  enemy  submarine  can 
possess  its  own  independent  knowledge  opponent  model  with  provisions  only 
for  collision  avoidance  and  mutual  attack  avoidance. 

TRAINING  REQUIREMENTS 

The  model  must  be  compatible  with  current  SCST  training  objectives. 
Independent  of  the  specific  features  of  the  model  are  considerations  and 
characteristics  that  are  required  for  the  training  objectives  of  the  SCST 
to  be  met. 

a.  Training  Management.  The  model  must  perform  adequately  enough 
so  that  the  training  instructors  will  actually  be  relieved  of  their 
responsibilities  for  scenario  managanent. 

b.  Model  Override.  The  instructors  must  be  able  to  take  control  of 
the  opponent  submarine  at  any  time  and  maneuver  it  as  they  are  currently 
able  to  do. 

c.  Performance  Measurement.  The  tactics  and  behavior  of  the  enemy 
submarine  must  be  conducive  to  the  collection  of  meaningful  student  per¬ 
formance  evaluation  data. 

d.  Modification  Ease.  The  model  must  be  designed  so  that  tactical 
and  behavioral  changes  are  not  only  easy  to  make  but  can  also  be  made  in 
real  time  by  the  instructors  during  a  simulation  exercise. 

e.  Real-World  Fidelity.  Real-world  fidelity  should  be  maintained 

as  much  as  possible.  This  requirement  was  considered  to  be  more  important 
than  fidelity  to  training  objectives  by  the  interviewed  submarine  trainer 
instructors.  The  apparent  reason  for  this  preference  is  that  the  train¬ 
ing  objectives  are  under  the  control  of  the  training  facility  and  can  be 
modified  easily.  However,  if  the  real-world  fidelity  is  sacrificed  for 
training  objectives,  modification  is  considerably  more  difficult.  It  is 
not  clear  that  real-world  fidelity  and  training  objectives  fidelity  are 
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SECTION  IV 
MODELS  DESCRIPTION 


GENERAL 

This  chapter  presents  an  overview  and  some  details  of  four  types  of 
decision  models  which  are  potentially  appropriate  for  simulating  an  intel 
ligent  opponent  within  the  SCSTs.  Considering  variations  of  each  model, 
and  the  possibility  of  combining  models,  many  useful  combinations  can  be 
derived  to  represent  the  intelligent  opponent. 

Since  the  opponent  and  friend  have  essentially  the  same  decision 
structure,  the  same  model  which  is  developed  for  the  opponent  can  also 
model  the  friend.  This  brings  up  a  number  of  interesting  and  useful 
possibilities: 

a.  Play  one  model  against  the  other.  By  doing  this,  it  will  be 
easier  to  debug  the  software.  Also,  it  is  possible  to  develop  a  set  of 
performance  baselines  which  can  be  used  for  further  model  development  and 
to  develop  evaluation  guidelines. 

b.  The  opponent  model  easily  contains  a  model  of  the  friend.  Fur¬ 
ther  levels  of  recursion  are  possible.  For  example,  the  friend  can  be 
aided  by  an  opponent  model  which  contains  a  friend  model. 

c.  Different  models  can  play  each  other  to  evaluate  which  model  is 

best. 

d.  Different  parameter  values  can  be  set  for  each  model  and  the 
models  can  play  each  other  in  order  to  evaluate  the  effectiveness  of  vari 
strageties  and  various  assumptions  regarding  opponent  capabilities. 

It  should  be  emphasized  that  when  the  same  model  is  used  for  several 
purposes,  different  behavior  can  be  created  by  varying  model  parameters, 
even  the  same  model  will  display  different  behavior  patterns  in  slightly 
different  circumstances.  Furthermore,  some  of  the  model  behavior  will  be 
generated  randomly  (e.g.,  the  specifics  of  an  evasion  maneuver),  thus 
defying  the  student  from  capturing  a  standard  response. 

POTENTIAL  MODELS 

From  an  analysis  of  the  requirements  of  the  knowledgeable  opponent 
model  and  from  an  analysis  of  existing  simulation  and  modeling  techniques 
four  major  approaches  have  been  Identified  which  show  potential  for  model 
implementation.  These  approaches  are: 

a.  Elicited  probability  approach. 

b.  Adaptive  decision  modeling  approach. 
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c.  Heuristic  search  approach. 

d.  Production  rules  approach. 

The  elicited  probability  approach  to  scenario  generation  is  a 
derivative  of  the  Bayesian  analysis.  It  essentially  selects  randomly 
among  the  alternative  actions  available  at  each  point,  but  the  probability 
of  selecting  each  alternative  is  elicited  from  experts  to  resemble  actual 
behavior. 

The  adaptive  decision  model  is  based  on  the  adaptive  linear  pattern 
recognizer.  The  model  "learns"  the  proper  choices  it  has  to  make  by 
following  those  made  by  an  expert--a  trainer.  It  then  uses  the  trained 
parameters  to  make  the  right  choices  even  in  situations  which  are 
dissimilar  to  those  under  which  it  was  originally  trained. 

In  the  heuristic  search  approach,  the  problem  domain  is  represented 
as  a  network  of  "states"  each  representing  a  specific  tactical  situation. 
The  objective  of  the  CO  is  to  reach  some  desired  goal  (missionl.  which  is 
also  a  state  in  the  "state  space." 

From  the  state  he  is  in,  the  CO  will  perform  a  "Look  ahead"  search 
to  identify  which  alternative  action  open  to  him  will  bring  him  closer  to 
the  goal  state.  This  goal  directed  behavior  is  continued  even  if  the 
state  is  changed  by  external  events  or  actions  of  the  adversary,  thus 
depicting  intelligent  behavior. 

In  the  production  rule  approach,  the  expertise  of  the  problem  domain 
is  represented  as  "condition--action"  chunks.  A  control  mechanism  acti¬ 
vates  the  relevant  productions  and  generates  a  chain  of  actions  that  would 
lead  from  the  current  situation  to  the  desired  goal. 

It  is  clear  that  these  models  are  quite  different  from  each  other. 

The  rest  of  this  chapter  will  describe  them  in  detail  and  specify  the 
advantages  and  disadvantages  of  each  for  our  purpose--the  modelling  of  an 
intelligent  opponent. 

THE  ELICITED  PROBABILITY  APPROACH 

INTRODUCTION.  The  elicited  probability  approach  to  scenario  generation 
and  opponent  simulation  uses  an  incremental,  descrete  description  of  the 
tactical  scenario.  This  description  has  the  form  of  a  state  vector  Z  . 

The  vector  is  made  up  of  components  each  representing  the  state  of  some 
tactical  aspect  of  the  situation  at  a  given  instant  t,  thus: 

7*  -  Cz{,  l\ . zj]  (1) 

In  the  tactical  submarine  simulation  the  components  of  the  state  vector 
may  be: 
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Zj  =  "How  deep  is  the  water"  (7) 

Zo  =  "How  far  is  friend" 

Z^  =  "How  many  friend's  subs  are  in  the  area" 

The  value  of  a  component  of  the  state  vector  is  one  of  the  possible 
answers  to  these  questions.  Thus  7j  can  be,  at  a  qiven  time  t  either 
"deep,"  "medium"  or  "shallow."  The  value  of  ?•>  may  be  either  "undetected," 
"far,"  "within  passive  listening  range,"  "within  active  sonar  range"  or 
"within  torpedo  range."  The  composition  of  the  state  vector  is  determined 
by  elicitation  from  experts.  The  number  of  discrete  values  which  each 
component  can  assume  need  not  be  large,  it  is  only  determined  by  what 
makes  a  tactical  difference.  If  the  tactics  of  the  simulated  opponent 
would  be  different  in  "shallow"  waters  than  that  in  "medium"  or  in  "deep" 
waters,  then  only  these  three  discrete  values  are  needed  in  the  tactical 
simulation.  Other  components  of  the  state  vector  may  have  more  or  less 
numerous  discrete  values,  again  depending  on  how  many  are  relevant  tacti¬ 
cally.  These  discrete  values  are  used  in  the  intelligent  part  of  the 
simulation--the  part  that  chooses  and  changes  tactical  maneuvers.  The 
part  that  generates  the  actual  display  is  incremental  and  thus  can  generate 
continuous  motions. 

Figure  2  depicts  the  basic  operation  and  main  blocks  of  the  simulation 
system.  The  system  goes  repeatedly  through  the  following  cycle;  it  starts 
from  the  current  state  vector  7*  and  calculates  the  state  of  the  world  at. 
the  next  time  interval  The  calculation  is  done  in  two  steps.  First, 

a  probability  matrix  is  used  to  determine,  from  the  current  state  of  the 
world,  what  are  the  tactics  that  should  be  performed .  Then,  the  tactics 
chosen  are  used  to  transform  the  current  state  vector  to  the  vector  of  the 
next  time  interval  7*  .  This  new  vector  might  include  an  incremental 

change  in  location:  v\,  \'Y,  a  change  in  direction:  \  ,  or  a  firing  ot  a 
torpedo  which  is  another  component  of  the  state  vector.  The  same  new 
vector  is  now  used  also  to  generate  the  new  outputs  that  will  produce  the 
new  display  for  the  user  (interfaces  with  the  current  system). 

The  new  value  of  the  state  vecto.  is  now  fed  back  to  the 

starting  point  where  it  is  used  as  the  current  state  vector  for  the  next 
time  interval.  Thus,  the  total  process  progresses  cyclically  through  this 
sequence  of  steps. 

UPDATING  THE  STATE  VECTOR,  The  actual  calculation  of  the  changes  of  the 
state  vector  is  somewhat  more  complex  than  what  was  described  above.  The 
complexity  is  necessary  to  provide  some  randomness  in  the  simulated 
behavior  to  prevent  the  trainee  from  learning  a  prerecorded  scenario. 

The  randomness  is  generated  from  probability  information  elicited  from 
experts,  and  thus  the  behavior  produced  would  be  typical  and  similar  to 
an  opponent  caimander  behavior  but  would  still  be  unpredictable  in  its 
details. 
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Figure  3  shows  in  more  detail  the  specific  steps  that  are  taken 
in  calculating. the  state  vector  at  time  t-*l  from  I  at  time  t.  The  current 
state  vector  Zl  is  used  to  select,  by  combining  conditional  probabilities, 
the  tactics  to  be  applied.  Let  us  define  the  terms  more  precisely  before 
using  them.  Let  us  call  the  vector  of  all  the  tactics  that  are  performed 
at  time  t  by: 

'ft  B  (V  T2 . V  (3) 

T.  might  be  "turn  right  10","  T0  might  be  "decrease  depth  to  periscope 
level."  More  than  one  activity^can  take  place  at  the  same  time,  so  that 
the  vector  format  is  needed  to  combine  the  effect  of  all  in  each  time 
interval.  To  determine  which  tactics  should  be  selected  in  the  next 
interval  a  conditional  probability  is  needed:  Pfrjz*).  This  conditional 
j)robability  answers  the  following  question:  given  the  current  situation 
Z*  what  tactics  should  be  applied--!*.  Z*  and  T*  are  vectors  with  many 
elements  and  that  makes  the  conditional  probability  a  matrix  of  the  form: 

Pd^Yz1)  =  { ! zj)}  j  j  (4) 

In  every  row  i,  which  corresponds  to  a  tactics  T^,  the  entries  indicate, 
the  conditional  probability  of  selecting  these  tactics  given  that  the  Z. 
component  of  the  state  vector  is  present.  For  instance,  one  entry  might1 
be  the  answer  to:  Ifhat  is  the  probability,  given  that  friend  is  "in 
torpedo  range"  that  the  tactics  "shoot  a  torpedo"  be  applied.  There  are 
two  problems  with  this  approach.  One  is  the  independence  of  the  state  . 
vector  elements,  i.e.,  whether  the  conditional  probability  of  a  tactics  T. 
given  Zj  is  independent  of  the  other  components  of  Z.  The  other  problem1 
is  meaningfulness  to  the  expert.  For  example,  a  question  like:  Ifhat  is 
the  probability  of  choosing  a  "zigzag  maneuver  to  the  right"  given  "enemy 
sub  is  nuclear?"  Posing  the  question  the  other  way  around  should  prove 
much  more  meaningful:  Given  a  tactics  T ■  what  set  of  events  would  cause 
you  to  choose  it?  The  natural  question  to  an  expert  is  the  conditional 
probability  matrix: 

P(7t|Tt+l)  =  {PtfjlTJ+1)}i j  (5) 

This  matrix  of  probabilities  is  obtained  from  experts  in  submarine  tactics. 
The  expert  estimates  can  be  based  upon  experience,  upon  real  world 
measurements,  upon  theoretical  models,  etc.  It  is  also  possible  to 
determine  the  conditional  probabilities  by  collecting  statistics  during 
an  actual  training  session  in  which  the  instructors  are  controlling 
opponent  actions. 

To  calculate  the  conditional  probability  in  (4)  from  the  estimated 
conditional  probability  given  in  (5)  the  following  formula  has  to  be 
used: 

P(T,|7t)  .  P(T.i)P(^|Tj)  (6) 

J  P(Zf) 
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Figure  3.  Detailed  System  Block  Diagram 
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This  formula,  basic  in  Bayes  probability  theory,  combines  the  conditional 
probabilities  P ( 2^ | Tj )  to  give  P(Tj f Z1 ) . 

Two  additional  vectors  of  a  priori  probabilities,  also  estimated  by 
experts,  are  required.  The  components  of  the  first  vector,  F,,  are  the 
a  priori  probabilities  that  each  state  transformation  operator  will  be 
sel ec  ted .  They  are  represented  thusly: 

P0T  *  Cp0<T1>-  »o(I2> . po'Vl  <7) 

The  components  of  the  second  vector,  P2,  are  the  a  priori  probabilities 
of  the  occurrence  of  each  state  component  of  the  ^vector.  They  are 
represented  as  follows: 

"oz  *  tP0(Zj),  p0(2?) . P0(Zn)]  (8) 

The  a  priori  probabilities  don't  have  to  be  estimated  with  great 
precision  because,  as  the  scenario  unfolds,  they  have  less  and  less 
effect  over  the  behavior  of  the  scenario. 


If  we  assume  independence  of  the  impact  of  the  different  components 
of  the  state  vector  then: 


Pd'lT,)  •  n  p(z*]t  > 
J  i=1  1  J 


(9) 


Thus,  equation  (6)  becomes 


PdjH1) 


pOj)  «  p(z‘|Tj) 
n  p(zj) 


(10) 


When  equation  (10)  is  implemented,  the  p(TJIt)  are  normalized;  thus,  the 
denominator  in  (10)  is  not  needed.  J 


Table  1  is  a  partial  example  of  the  probabilities  as  they  are 
elicited  from  the  experts  and  after,  they  are  used  in  formula  10  to  obtain 
the  conditional  probabilities  P(T^+  jZp.  The  left  most  column  shows  the 
components  of  the  state  vector  and  theWalues  that  they  can  assume. 

The  list  of  useful  tactics  are  indicated  on  the  top.  The  first  column  of 
numbers  and  the  first  row  indicate  the  a  priori  probability  of  each  state 
vector  component  value  and  each  tactics.  The  body  of  the  table  contains 
the  conditional  probabilities.  Looking  at  the  second  row  of  numbers,  the 
probability  of  friend  being  undetected  is  0.9  if  the  tactics  is  "proceed” 
but  it  is  0.0  if  the  tactics  is  "run."  This  makes  sense;  because  if 
friend  is  undetected,  there  is  no  reason  to  choose  "run."  Naturally,  each 
row  sums  up  to  1  because  if  that  particular  state  variable  is  present,  it 
must  have  some  succeeding  action,  even  if  it  is  only  "proceed." 


The  assumption  that  the  variables  which  comprise  the  state  vector 
are  Independent  is  a  crucial  one.  The  most  practical  way  to  meet  this 
condition  is  to  take  care  to  define  the  state  vector  such  that  it  is 


NAVTRAEQUIPCEN  78-C-0107-1 


TABLE  1.  MANEUVER  SELECTION  MATRIX 


State  Hide  Run  Proceed 


A  Priori 
Probabi 1 i ty 

0.05 

0.1 

0.7 

How  Tar  Is  Friend: 

llmle tec  ted 

0.60 

0.05 

0.0 

0.9 

Very  Near 

0.1S 

0.1 

0.1 

0.1 

Near 

0.15 

0.? 

0.15 

0.05 

Medium 

0.05 

0.3 

o.:'o 

0.15 

Far 

0.05 

0.35 

0.20 

0.25 

Has  Friend  Detected:  Ves 

0.15 

0.2 

0. 

1 

0.1 

Possible 

0.15 

0.20 

0. 

20 

0.2 

No 

0.70 

0.10 

0. 

15 

0.45 

War  State: 

War 

0.01 

0.1 

0.1 

0.5 

Peace 

0.99 

0.1 

0.1 

0.7 

0.3 


Hater  Depth: 


Shalt  on 


0.1 


0.3 


0.1 
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independent.  If  there  are  dependencies  in  the  state  vector,  they  may 
not  noticeably  affect  the  behavior  of  the  scenario  (e.g.,  environment, 
opponent's  actions).  This  can  be  t^-'ted  by  using  the  model  to  generate 
behavior  which  is  viewed  by  the  person  from  whom  the  probabilities  were 
elicited.  If  the  behavior  is  not  as  desired,  the  elicited  probability 
values  can  be  fine-tuned  until  the  proper  behavior  is  obtained. 

One  technique  of  handling  dependencies  in  the  state  vector  is  to 
also  elicit  the  covariance  matrix  representing  the  correlation  ai  ig 
state  variables.  This  matrix  can  then  be  used  in  one  of  two  methods: 

a.  The  problem  is  transformed  into  a  domain  where  independence 
holds  (by  proper  selection  of  independent  tactically  significant  state 
vector  components). 

b.  The  covariance  matrices  are  used  to  derive  weights  to  compensate 
for  dependence. 

Both  methods  have  several  disadvantages: 

a.  The  covariance  matrices  are  dependent  on  the  order  of  processing 
state  variables;  a  different  covariance  matrix  must  be  used  for  each 
order. 

b.  The  covariance  matrices  involve  either  asking  people  to  estimate 
means  and  standard  deviations,  or  polling  a  group  of  experts  and  collecting 
these  statistics. 

c.  When  the  probabilities  are  subjectively  determined  (by  elicita¬ 
tion),  the  precision  of  the  problem  is  such  that  the  covariance  matrices 
may  be  meaningless. 

In  general,  the  complexity  of  using  the  covariance  matrices  seems  to 
exceed  that  justified  by  meaning  and  relevance. 

Another  method  of  handling  dependencies  in  the  state  vector  is  to 
construct  a  new  set  of  variables  based  on  permutations  of  some  of  the 
dependent  variables.  This  approach  is  simple,  but  leads  to  a  rapid 
increase  in  the  si2e  of  the  state  vector. 

Going  back  to  Figure  3,  formula  10  is  used  to  obtain,  from  the  current 
state  vector,  the  tactic's  probability  vector  (TPV): 

P(Tt+1)  =  [P(T{+1),  P(t£+1) . P(T^+1)]  (ID 

This  vector  indicates  the  probability  of  selecting  tactics  Tj  in  the 
current  tactical  situation.  The  next  step  is  to  select  the  tactics  to  be 
actually  applied.  This  can  be  done  in  several  ways: 

a.  Select  the  tactics  with  the  highest  probability. 
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b.  Select  all  the  tactics  with  probability  higher  than  some  thresh¬ 
old  level . 


c.  Select  the  tactics  randomly  but  in  such  a  way  that  the  probability 
to  select  a  particular  tactics  is  proportional  to  its  P(Ti+  ).  (This  is 
the  "Monte  Carlo"  method.) 


After  the  tactic 
step  is  to  actually 
apply  a  transformati 
new  one: 


(or  combination  of  tactics)  is  selected,  the  next 
the  tactics.  In  terms  of  the  model,  we  will 
to  the  current  state  vector  to  obtain  the 

7t+*  7'[rt  +  1]  ^  ([tJ  a  matrix)  (17) 


Slw 


There  are  virtually  no  restrictions  on  the  kinds  of  state  transforma¬ 
tion  operators  which  can  be  defined.  A  transformation  operator  may 
affect  a  single  state  variable  and  generate  a  constant  output.  It  may 
also  affect  a  large  number  of  state  variables  and  make  use  of  a  complex 
decision  strategy  to  determine  their  values.  The  transformation  operator 
may  even  determine  the  value  of  a  variable  for  several  subsequent  time 
cycles. 

A  transformation  operator  may  make  use  of  subsets  of  7*  which  were 
not  used  in  selecting  the  operator.  An  operator  may  also  make  internal 
use  of  Bayesian  aggregations  based  upon  additional  conditional  probability 
matrices  and  subsets  of  r .  Thus,  hierarchies  of  transformation  operators 
can  be  established. 


Each  transformation  operator  affects  a  set  of  one  or  more  state 
variables.  The  operators,  in  turn,  are  grouped  according  to  which  set 
of  variables  they  affect.  These  sets  of  variables  must  be  disjoint 
because,  after  a  single  operator  is  selected  from  each  set,  the  selected 
operators  are  assumed  to  be  invoked  simultaneously.  If  the  sets  of  vari¬ 
ables  are  not  disjoint,  the  order  in  which  the  selected  operators  are 
actually  invoked  will  affect  the  value  of  the  transformed  state  vector. 
However,  non-disjoint  sets  of  variables  can  be  handled  by  establishing  a 
hierarchy  of  operators  within  a  "higher  level"  operator. 

The  selection  of  one  state  transformation  operator  from  each  operator 
set  is  made  by  means  of  a  Monte  Carlo  selection  procedure.  The  probabili¬ 
ties  of  occurrence  of  each  operator  in  the  set  are  normalized  to  obtain 
a  discrete  cumulative  distribution  function.  A  uniformly  distributed 
pseudorandom  number  in  the  range  [0,1]  is  then  generated  and  its  position 
in  the  distribution  function  is  used  to  select  the  operator.  Alternatively, 
the  operator  with  the  highest  probability  could  be  selected. 

In  some  experimental  applications,  it  may  be  useful  or  necessary  to 
know  the  probabll ity  that  a  state  variable  will  have  a  particular  value, 
p(z£  I|2t).  By  restricting  the  kinds  of  allowable  state  transformation 
operators  to  those  that  generate  a  constant  (and  unique)  result,  it  is 
possible  to  obtain  these  probabilities  directly  from  the  scenario  genera¬ 
tor,  If  state  transformation  operator,  Tj,  outputs  the  same  value  for 
zjj+1  whenever  it  is  invoked,  and  only  T.  outputs  that  value,  then 
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If  mon'  complex  transformat  ion  operators  arc  used.  p(r*+*|7*)  becomes 
more  difficult  to  compute.  A  value  can  always  be  obtained,  however,  by 
making  statistical  measurements  of  the  behavior  of  the  scenario  generator 

The  current  state  vector,  /* ,  is  transformed  into  by  the 

(assumed)  simultaneous  invocation  of  all  of  the  selected  state  transforma 
tion  operators.  If  the  state  vector  is  properly  designed,  it  is  possible 
to  use  the  Bayesian/Monte  Carlo  selection  mechanism  to  choose  all  of 
these  operators.  However,  in  many  instatues  it  may  be  more  convenient 
to  use  "external"  mechanisms  to  select  transformat  ion  operators  for  cer¬ 
tain  subsets  of  the  state  vector.  These  externally  controlled  state 
vector  subsets  will  be  collectively  referred  to  as  the  F*  subvector  (see 
Figure  3).  Examples  of  externally  controlled  state  variables  would  in¬ 
clude  clock-driven  variables  such  as  day  and  night,  high  and  low  tides, 
and  events  which  occur  on  a  fixed  schedule. 

PROBABILITY  ELICITATION.  Previous  research  has  shown  that  human  experts 
are  good  at  estimating  conditional  probabilities,  but  poor  at  aggregating 
them  (e.g.,  Edwards,  1962).  Accordingly,  the  present  scenario  generator 
uses  conditional  probabilities  elicited  from  experts  and  aggregates  them 
automatically.  First,  expert  inputs  are  used  to: 

a.  Describe  the  environment  to  be  modeled  in  terms  of  relevant 
state  variables. 

b.  Determine  which  variables  are  externally  controlled  and  which 
are  controlled  by  the  Bayesian  model. 

c.  Define  all  of  the  transformat  ions  which  change  the  state  varia¬ 
bles  . 

Then,  the  expert  is  queried  in  detail  to: 

d.  Estimate  the  a  priori  probabilities  and  the  individual  condi¬ 
tional  probability  which  consitute  the  entire  matrix. 

The  method  of  elicitation  is  simply  to  interview  the  expert  and  ask  him 
the  probabilities.  Bond  and  Rigney  ( t °66 )  were  able  to  elicit  almost 
650  conditional  probabilities  associated  with  electronic  troubleshooting 
in  one  hour  using  a  simple  questionnaire. 

The  process  of  probability  elicitation  is  an  iterative  one  which 
allows  the  expert  to  refine  his  estimates.  That  is,  once  the  initial 
estimates  are  made,  test  scenarios  are  generated  which  allow  the  expert 
to  see  the  consequences  of  his  estimates.  He  is  then  asked  to  modify  his 
estimates  to  make  them  more  consistent  with  the  desired  behavior  of  the 
scenario  generator. 
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ELICITED  PROBABILITY  APPROACH  -  SUMMARY. 

Advantages 

a.  Simplicity;  easy  to  develop,  maintain.  Implement. 

b.  Generates  a  probabilistic  opponent  and  environment. 

c.  Weights  representing  behavior  are  easy  to  elicit  and  to  alter. 

d.  State  oriented;  easy  to  switch  between  manual  and  automatic 
operation. 

Disadvantages 

a.  It  is  difficult  to  alter  structural  aspects  due  to  the  need  to 
avoid  dependencies  in  the  state  vector. 

b.  Difficult  to  insert  logical  statements  to  control  the  scenario. 

c.  The  application  of  state  transformation  operators  may  be  order 
dependent. 

d.  It  is  difficult  to  Isolate  the  particular  entry  in  the  trans¬ 
formation  matrix  that  caused  some  behavior  and  to  give  it  a  tactical 
interpretat ion. 

TMt  ADAPTIVE  DECISION  MODI  1  INC.  APPROACH 

INTRODUCTION.  Ihe  adaptive  decision  approach  to  generating  knowledgeable 
opponent  behavior--which  uses  pattern  recognition --is  based  on  learning 
opponent  decision  modeling  and  utility  theory.  In  the  present  application 
all  of  the  relevant  Information  for  selecting  the  opponent's  next  action 
is  Immediately  available  at  the  time  it's  needed.  Ihe  model,  which  is 
first  adapted  to  choices  made  by  an  expert,  is  then  used  to  calculate 
the  value  of  each  alternative,  and  the  alternative  with  the  highest  value 
Is  chosen  for  actual  execution  by  the  system. 

ADAPTIVE  DECISION  MODE  I  INC..  Work  on  adaptive  decision-making  Is  derived 
from  the  areas  of  behavioral  decision  research  and  Al  experience  with 
learning  networks.  The  unique  aspect  of  this  approach  is  the  capability 
to  adjust  model  parameters  on-line  and  change  decision  strategy  accordingl 
In  essence,  the  learning  system  attempts  to  identify  the  decision  process 
of  the  human  operation  on-line  by  (a)  successive  observation  of  his 
actions,  and  (b)  establishment  of  an  interim  relationship  between  the 
Input  data  set  and  the  output  decision  (the  model),  learning  in  this 
context  refers  to  a  training  process  for  adjusting  model  parameters 
according  to  a  criteria  function.  The  object  is  to  improve  model  per¬ 
formance  as  a  function  of  experience,  or  to  match  the  model  characteris¬ 
tics  to  that  of  the  operator. 
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learning  techniques  have  been  used  to  model  the  decision  strategy 
and  to  identify  the  sources  of  cognitive  constraints  on  the  human 
operator  performing  a  dynamic  prediction  task  (Rouse,  1972).  Another 
example  of  an  adaptive  model  of  the  human  operator  through  real  time 
parameter  tracing  has  been  reported  by  (lilstad  and  l"u  (1970).  Linear 
and  piecewise- 1  inear  discriminant  functions  were  used  to  classify 
system  gains,  errors  and  error  rate.  The  decision  boundaries  for 
classification  were  determined  through  a  process  on  on-line  learning, 
observing  operator  performance  and  parameter  adjustment.  The  specific 
model  used  was  applicable  only  to  very  limited  tasks,  and  merely 
Illustrated  the  feasibility  of  the  technique. 

A  unique  advantage  of  using  a  learning  system  lies  in  its  capability 
to  act  as  a  pattern  classification  mechanism.  As  such,  it  can  be  used 
to  identify  biases  in  operator  decision  policy  as  a  response  to  classes 
or  patterns  in  the  input  data  (Tversky,  ct  al ,  1972).  In  conventional 
Bayesian  technique,  the  pattern  of  events  is  decomposed  into  elementary 
data  points.  With  the  assumption  of  independence,  the  elementary  data 
points  are  aggregated  to  revise  the  hypothesis.  I  ffects  of  the  data 
pattern  do  not  bear  on  the  decision. 

In  dynamic  decision  making,  however,  the  temporal  and  spatial  nature 
of  the  data  are  highly  significant.  Since  decision  data  appear  as  a 
pattern  of  individual  events,  it  is  reasonable  to  assume  that  the  subject 
responds  to  the  pattern  as  well  as  to  the  individual  value.  In  fact,  the 
pattern  may  contain  the  greater  amount  of  information.  Classification  of 
input  patterns  by  the  learning  mechanism  can  be  accomplished  by  programmed 
cognizance  of  such  data  features  as:  data  with  non-independent  events, 
data  with  correlated  events,  data  with  events  which  continuously  vary 
with  time,  the  number  of  elements  of  decision  data  and  the  rate  of  change 
in  the  data  points. 

THf  MAU  MODEL.  Multi-attribute  decision  analysis  is  the  most  widely  used 
approach  for  making  evaluations  involving  multiple  criteria.  MAU  methods 
decompose  the  complex  overall  evaluation  problem  into  more  manageable  sub¬ 
problems  of  scaling,  weighting,  and  combining  criteria.  In  doing  so, 
the  MAU  methods  provide  a  rich  framework  for  analysis,  discussion,  and 
feedback.  This  "divide  and  conquer"  approach  to  evaluation  involves 
defining  the  problem,  identifying  relevant  dimensions  of  value,  scaling 
and  weighting  the  dimensions,  and  finally  aggregating  the  dimensions  into 
a  single  figure  of  merit  for  the  system. 

The  power  of  the  multi-attribute  approach  lies  in  its  level  of 
analysis  and  flexibility.  Sensitivity  analyses  of  the  level  and  weight 
of  each  dimension  can  provide  indications  of  what  aspects  to  concentrate 
tests  on,  or  what  system  elements  to  modify.  Flexibility  is  present, 
since  criteria  can  be  added  or  deleted  as  necessary.  Also,  the  weights 
and  levels  can  be  quickly  adjusted  according  to  now  functional  require¬ 
ments  and  capabilities. 

In  the  MAU  model,  the  consequences  of  every  action  are  considered  to 
be  decomposable  according  to  a  single  common  set  of  attributes.  The 
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model  computes  an  aggregate  mul tl -attribute  utility  (MAU)  as  a  weighted 
sum  of  each  consequence  attribute  level  (A^)  multiplied  by  the  Importance 
or  utility  of  the  attribute  (VT).  The  calculated  MAU  of  each  action  is 
used  as  the  selection  criterion: 

’  \  “iAik  »4> 

where 

MAU.  *  the  aggregate  utility  of  option  ,i 

Wj  *  the  importance  weight  of  attribute  i,  and 

A^  =  the  level  of  attribute  1  for  action  k. 

Figure  4  shows  the  major  components  of  the  MAU  model  in  block  dia¬ 
gram  form.  Possible  actions  are  parameterized  in  terms  of  attribute 
levels.  The  MAU  calculator  uses  as  inputs  (1)  the  attribute  levels  of 
the  given  action,  and  (2)  a  vector  of  "attribute  weights"  which  have 
been  dynamically  estimated  for  a  given  operator  by  an  adaptive  model. 

Calculation  of  the  mul ti -attribute  utility  for  each  action  is 
central  to  the  operation  of  the  model.  The  MAU  calculation  is  shown  in 
Figure  5.  The  dot-product  of  the  attribute  level  vector  and  the  attri¬ 
bute  weight  vector  provides  the  aggregate  MAU  value.  The  attributes  are 
scaled  so  that  each  attribute  level  ranges  from  0  to  1 .  Further,  the 
orientation  is  arranged  such  that  each  attribute  contributes  positively 
to  the  overall  aggregate  MAU.  That  is,  holding  all  other  attribute 
levels  constant,  an  increase  in  any  attribute  level  increases  the  MAU. 

ATTRIBUTE  CHOICE.  The  determination  of  attributes  to  include  in  the 
decision  model  is  probably  of  greater  importance  than  the  accurate  assess¬ 
ment  of  the  importance  weights  (Dawes,  1975).  The  following  list  of 
desirable  characteristics  for  the  attributes  expands  on  Raiffa's  (1969) 
recommendations  of  attribute  independence,  set  completeness,  and  minimum 
dimensionality: 

a.  Accessible.  The  levels  of  each  factor  should  be  easily  and 
accurately  measurable. 

b.  Conditionally  Monotonic.  The  factor  level  should  be  monotonic 
with  the  criterion' (preference)  regardl ess  of  the  constant  values  of 
other  factors. 

c.  Value  Independent.  The  level  of  one  attribute  should  not  depend 
on  the  levels  of  the  other  attributes.  This  is  to  some  extent  a  conse¬ 
quence  of  recommendation  b. 

d.  Complete.  The  set  of  attributes  should  present  the  operator's 
behavior  as  compTetely  as  possible. 
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Figure  4.  Overview  of  Action  Selection  Model 
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e.  Meaningful .  The  attributes  should  be  reliable  and  should  demon¬ 
strate  construct  validity.  Feedback  based  on  the  model  attributes  should 
be  understandable  to  the  operator. 

For  the  most  part,  these  recommendations  result  in  an  attribute  set 
that  is  measurable,  predictive,  and  in  accord  with  the  axioms  of  utility 
theory.  The  recommendations  also  imply  a  limitation  on  the  number  of 
possible  attributes.  The  requirements  of  independence  and  meaningful  ness 
render  any  large  set  of  attributes  unrealizable,  because  of  the  cognitive 
limitations  of  the  human  operator. 

ADVANTAGES  OF  THE  MULTI-ATTRIBUTE  UTILITY  MODEL.  The  multi-attribute 
information  utility  model  presented  here  is  characterized  by  several 
attractive  features.  These  features,  itemized  below,  offer  substantial 
advantage  over  the  EU  decision  model.  The  advantages  arise  out  of  the 
theoretical  structure  of  the  model,  especially  its  decomposition  property, 
and  have  all  been  empirically  demonstrated  to  some  degree  in  ongoing 
Perceptronics  programs  (Samet,  Weltman,  and  Davis,  1976;  Steeb,  Chen  and 
Freedy,  1977). 

a.  Generality.  The  adaptive,  multi -attribute  model  for  information 
selection  holds  a  considerable  amount  of  generality.  It  can  be  applied 
in  situations  where  diagnostic  actions  can  be  decomposed  into  a  small  set 
of  manageable,  quantifiable  attributes  which  have  two  critical  characteris¬ 
tics.  First,  they  must  be  logically  related  to  the  situation-specific 
demands.  That  is,  their  relevance  to  specific  situations  must  be  known. 
Second,  they  must  directly  impact  upon  a  decision  maker's  choices  among 
competing  options.  A  number  of  military  decision-making  environments  have 
already  been  demonstrated  to  fit  this  paradigm  (e.g.,  Coats  and  McCourt, 
1976;  Hayes,  1964;  McKendry,  Enderwick  and  Harrison,  1971;  Samet,  1975). 

b.  Parsimony.  The  model  is  parsimonious;  it  need  only  assess  an 
operator's  weights  for  a  limited  number  of  information  dimensions  or 
attributes.  Besides  significantly  minimizing  the  model's  computational 
needs  and  software  complexity,  this  feature  reflects  findings  of  psycholo¬ 
gical  experiments  (e.g.,  Hayes,  1964;  Slovic,  1975;  Wright,  1974)  and  is 
in  agreement  with  contemporary  decision  theory  (e.g.,  Tversky  and 
Kahneman,  1974),  all  of  which  suggest  that  a  decision  maker  can  only  per¬ 
form  weighting  and  aggregation  on  a  relatively  small  number  of  the 
important  dimensions  in  the  decision  task.  Also,  when  decisions  are  based 
on  a  manageable  number  of  information  dimensions,  they  are  easier  to 
communicate  and  rationalize--especially  in  group  decision-making  situations 
(Gardiner  and  Edwards,  1975).  In  complex  situations,  therefore,  the  re¬ 
duction  in  the  number  of  model  parameters  in  the  proposed  MAU  model  as 
compared  to  the  expected  utility  model  are  of  major  Importance. 

c.  Robustness .  Like  other  linear  composition  models,  the  multi - 
attribute  decision  model  is  robust;  that  is,  its  performance  is  not 
significantly  degraded  by  small  perturbations  in  the  model's  parameters 
(Dawes  and  Corrigan,  1974).  Such  robustness  probably  contributes  to  the 
finding  that  multi -attribute  utility  assessment  techniques  have  proven. 
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in  certain  instances,  to  be  more  reliable  and  valid  than  direct  assess¬ 
ment  procedures  (Newman,  1975;  Samet,  1976). 

d.  Speed  of  Adaptation.  The  adaptive  model  adjusts  all  parameters 
with  each  incorrectly  predicted  trainer  decision  (i.e.,  action  selection). 
Thus,  weights  for  a  specific  attribute  can  be  obtained  rapidly  during 
sessions  in  which  the  trainer  performs  the  simulated  CO  decisions. 

e.  Flexibility.  The  multi-attribute  utility  model  is  inherently 
flexible.  If  accurate  prediction  of  action  selection  is  not  sufficient 
(i.e.,  if  attribute  weights  cannot  be  trained  to  stable  values),  addi¬ 
tional  features  or  attributes  can  be  added  and  inappropriate  ones  deleted. 
The  response  to  dynamic  changes  in  conditions  is  similarly  flexible.  In 
instances  where  conditions  change  rapidly  and  radically,  new  sets  of 
weights  trained  for  the  new  conditions  can  be  substituted.  Such  weight 
vectors  could  be  prepared  ahead  of  time  by  training  them  either  in  actual 
operational  situations  or  in  step-through  simulations. 

UTILITY  ESTIMATOR.  The  dynamic  utility  estimation  technique  is  based  on 
a  trainable  pattern  classifier.  Figure  5  illustrates  the  mechanism.  As 
the  operator  performs  the  task,  the  on-line  utility  estimator  observes  his 
choice  among  the  available  actions  at  each  point  in  the  sequence  and  views 
his  decision-making  as  a  process  of  classifying  patterns  consisting  of 
varying  attribute  levels.  The  utility  estimator  attempts  to  classify 
the  attribute  patterns  by  means  of  a  linear  evaluation  (discriminant) 
function.  These  classifications  are  compared  with  the  operator's  choices. 
Whenever  they  are  Incorrect,  an  adaptive,  error-correction  training 
algorithm  is  used  to  adjust  the  utilities.  A  comprehensive  discussion  of 
this  technique  can  be  found  In  Freedy,  Davis,  Steeb,  Samet,  and  Gardiner 
(1976). 

TRAINING  ALGORITHM.  On  each  trial,  the  model  uses  the  previous  utility 
weights  (W<)  for  each  attribute  (i)  to  compute  the  mul tl-attribute 
utilities  (MAUk)  for  each  action  (k).  Thus, 

MAUk  -  l  Wi  Ai|(  (15) 


where 


Wj  is  the  weight  of  the  attribute,  and 

f’  h 

A^k  Is  the  level  of  the  i  attribute  associated  with  action  k. 

The  model  predicts  that  the  operator  will  always  prefer  the  action 
with  the  maximum  MAU  value.  If  the  prediction  is  correct  (i.e.,  the 
operator  chooses  the  action  with  the  highest  MAU),  no  adjustments  are  made 
to  the  utility  weights.  However,  If  the  operator  chooses  an  action  having 
a  lower  MAU  value,  the  algorithm  goes  into  action  and  applies  the  error 
correction  training  formula.  In  this  manner,  the  utility  estimator 
"tracks"  the  operator's  decision-making  strategy  and  learns  his  utilities 
or  weights  for  the  attributes.  The  training  rule  used  to  adjust  the 
weights  associated  with  each  of  the  attributes  is  illustrated  In  Figure  5. 
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Actual  in-task  training  appears  feasible  using  pattern  recognition 
techniques.  Instead  of  batch  processing,  the  pattern  recognition  methods 
refine  the  model  decision-by-decision.  Briefly,  the  technique  considers 
the  decision  maker  to  respond  to  the  characteristics  of  the  various 
alternatives  as  patterns,  classifying  them  according  to  preference.  A 
linear  discriminant  function  is  used  to  predict  this  ordinal  response 
behavior,  and  when  amiss,  is  adjusted  using  error  correcting  procedures. 
This  use  of  pattern  recognition  as  a  method  for  estimation  of  decision 
model  parameters  was  apparently  first  suggested  by  Slagle  (1971).  He 
made  the  key  observation  that  the  process  of  expected  utility  maximiza¬ 
tion  involved  a  linear  evaluation  function  that  could  be  learned  from  a 
person's  choices. 

The  suggested  technique  was  soon  applied  by  Freedy,  Weisbrod,  and 
Weltman  (1973)  to  the  modeling  of  decision  behavior  in  a  simulated  intel¬ 
ligence  gathering  context.  Freedy  and  his  associates  assumed  the  deci¬ 
sion  maker  to  maximize  expected  utility  on  each  decision.  They  assigned 
a  distinct  utility,  U(xik)»  to  each  possible  combination  of  action  and 
outcome,  as  shown  in  the  decision  tree  in  Figure  6.  The  probabilities 
of  occurrence  of  each  outcome  j  given  each  action  k  were  determined  using 
Bayesian  techniques.  These  patterns  of  probability  were  used  as  inputs 
to  the  estimation  program  (Figure  7).  The  expected  utility  of  each 
action  Ak  was  then  calculated  by  forming  the  dot  product  of  the  input  pro¬ 
bability  vector  and  the  respective  utility  vector.  This  operation  is 
equivalent  to  the  expected  utility  calculation: 

EU(Ak)  =  $P(xjk).  U(xjk)  (16) 

J 

The  classification  weight  vector  Wjk  in  the  pattern  recognition  pro¬ 
gram  acts  as  the  utility  U(xjk).  The  alternative  Ak  having  the  maximum 
expected  utility  is  selectedDy  the  model  and  compared  with  the  decision 
maker's  choice.  If  a  discrepancy  is  observed  an  adjustment  is  made,  as 
shown  in  Figure  5.  The  adjustment  moves  the  utility  vectors  of  the 
chosen,  and  predicted,  actions  (Wc  and  W_,  respectively)  in  the  direction 
minimizing  the  prediction  error.  The  adjustment  consists  of  the  following 


w;  =  w„  -  d  •  pn 

c  c  p 

(17) 

w;  =  w  +  d  •  p„ 
p  p  c 

(18) 

where 

W'  is  the  new  vector  of  weights  [W(x-jc),  W(x2c)]  for  action  c 
W'  is  the  previous  weight  vector  for  action  c 
d  is  the  correction  increment 

p.j  is  the  probability  vector  describing  the  distribution  of  outcomes 

t  p,k  ,  P2k,  ....  Pnk]  resulting  from  action  k 
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The  model  is  an  adaptation  of  the  R-category  linear  machine  (Nilsson, 
1965).  The  pattern  classifier  receives  patterns  of  descriptive  data  (out¬ 
come  probabilities)  and  responds  with  a  decision  to  classify  each  of  the 
patterns  in  one  of  R  categories  (actions).  The  classification  is  made  on 
the  basis  of  R  linear  discriminant  functions,  each  of  which  corresponds 
to  one  of  the  R  categories.  The  discriminant  functions  are  of  the  form: 


g1 (x)  =  Wi  •  x  for  i=l ,  2,  ....  R 


09) 


where  x  is  the  pattern  vector  and  is  the  weight  vector.  The  pattern 
classifier  computes  the  value  of  each  discriminant  function  and  selects  the 
category  i  such  that 


g^x)  >  gj(x)  (20) 

for  all  j=l ,  2,  ....  R;  i^j 

A  geometric  interpretation  of  the  R-category  linear  machine  is  shown 
in  Figure  8  (Nilsson,  1965).  Decisions  involving  two  possible  consequences, 
Xj  and  x»,  are  evaluated  according  to  three  discriminant  functions  G,  (x), 

Gp  (x),  and  G-,  (x).  The  lines  of  intersection  between  the  discriminant 
hyperplanes  are  the  points  of  indifference  between  actions.  Mappings  of 
these  lines  of  intersection  to  the  attribute  plane  are  shown  in  the  figure. 
The  resulting  regions  R, ,  R2,  and  R3  correspond  to  the  actions  maximizing 
the  (expected  utility)  evaluation  function. 

The  R-category  technique  becomes  somewhat  cumbersome  if  a  large  number 
of  actions  are  possible  or  if  the  decision  circumstances  change  rapidly. 

This  problem  is  a  result  of  the  assignment  of  a  distinct,  holistic  utility 
to  each  tip  of  the  decision  tree.  The  number  of  model  parameters  thus 
increases  rapidly  with  an  increase  in  the  number  of  actions  possible.  Also, 
the  only  weight  vectors  adjusted  in  a  given  decision  are  those  corresponding 
to  the  model -predicted  and  the  actually  chosen  actions.  This  partial 
adjustment  makes  the  system  somewhat  unresponsive  to  change. 

A  natural  extension  of  Freedy's  approach  is  to  adapt  the  single  dis¬ 
criminant,  multi -attribute  approach  to  the  modeling  of  objective  choice 
behavior.  Each  possible  outcome  of  a  decision  can  be  associated  with  a 
set  of  attributes  or  objectives  of  the  decision  maker.  An  importance 
weight  vector  defined  over  the  various  attributes  can  then  be  adjusted  to 
predict  behavior.  The  mechanism  is  simply  that  of  a  threshold.  The 
adjustment  rule  following  an  incorrect  prediction  is  given  in  equation  21 
with  the  parameter  d  controlling  the  sensitivity  of  the  correction.  A 
large  d  will  cause  a  fast  adjustment  but  may  result  in  overshoot  and 
oscilitions  and  a  small  d  will  cause  slow  adaption. 

W*  =  W  +  d(xc  -  xp )  (21) 
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where 

W  is  the  updated  weighting  vector 
W  is  the  previous  weighting  vector 

is  the  attribute'  pattern  of  the  model -predicted  choice 
xc  is  the  attribute  pattern  of  the  decision  maker's  choice 
d  is  the  adjustment  factor. 

A  possible  advantage  of  the  pattern  recognition  technique  over  many  of 
tne  other  forms  of  estimation  is  its  flexibility  of  adjustment..  Several 
types  of  error  correction  are  possible  for  the  adjustment  rule,  each  with  a 
different  combination  of  speed,  stability,  and  complexity.  The  three  prin¬ 
ciple  forms  are  the  fixed  increment  rule,  the  absolute  correction  rule,  and 
the  fractional  correction  rule.  These  differ  solely  in  their  formulation 
of  the'  adjustment  factor  d  in  f qua t ion  ;M. 

Ihe  fixed  increment  rule  simply  assiqns  a  non-zero  constant  to  d.  Thus 
the  movement  of  the  weight  vector  is  a  constant  proportion  of  the  difference 
In  the  predicted  and  chosen  patterns.  The  correction  may  not  be  sufficient 
to  avoid  subsequent  errors  with  the  same  pattern,  but  the  process  is 
eventually  convergent  (Duda  and  Hart,  1973).  The  fixed  increment  rule  has 
the  advantages  of  simplicity  and  relative  insensitivity  to  inconsistent 
behavior. 

A  more  rapid  but  also  more  potentially  unstable  rule  is  the  absolute 
correction  rule.  This  method  sets  d  to  be  the  smallest  Integer  at  which 
the  error  of  the  pattern  is  corrected.  In  the  decision  modeling  situation, 
this  becomes : 


d  -  smallest  integer  •  |k 


•  <»c  •  y1 


{??-) 


in  which 


xc  is  the  attribute  level  vector  of  the  operator  selected  choice 

Xp  is  the  attribute  vector  of  the  predicted  choice 

The  fractional  correction  rule  is  similar  to  the  absolute  rule  but 
is  typically  less  extreme.  The  fractional  rule  moves  the  weight  point  some 
fraction  of  the  above  distance: 


d 


<*c  •  yi 

y<v  -  y 


<?3) 


.w 


where  \  is  a  constant  0  «■  \  '2. 
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All  three  of  the  adjustment  rules  have  been  proven  convergent  with 
linearly  separable  patterns  (Nilsson,  IdbS).  lhe  speed  of  convergence  is 
normally  fastest  with  the  absolute  rule.  This  is  illustrated  for  an 
example  series  of  adjustments  in  Mgure  <?.  The  set  of  four  numbered  lines 
in  the  figure  are  a  sequence  of  patterns.  These  patterns  are  shown  as 
hyperplanes  in  a  ^-dimensional  weight  space,  lach  hyperplane  represents 
the  difference  between  two  mill ti -attribute  vectors.  The  operator  choice 
is  shown  by  the  direction  of  the  arrow  at  each  pattern.  The  absolute 
rule,  (the  triangles  in  the  figure)  achieves  corret f  prediction 
after  four  observations,  while  the  fixed  rule  (the  circles)  requires  five. 
Unfortunately,  the  absolute  rule  is  expected  to  be  less  forgiving  of 
inconsistent  behavior  than  the  fixed  or  fractional  rules.  This  is  because 
of  the  large  responses  the  absolute  rule  makes  to  operator  inconsistencies. 
The  fixed  and  fractional  rules  may  exhibit  a  greater  tendency  to  smooth  or 
average  the  behavior. 

AN  EXAMPLE.  For  an  example  of  how  the  adaptive  decision  analysis  approach 
is  applied,  consider  the  select  maneuver  decision.  Assume  it  has  already 
been  decided  that  the  goal  of  the  maneuver  should  be  to  evade. 

Assume  that  the  following  alternative  evasive  maneuvers  are  available: 


a. 

Sink 

to 

the  bottom 

and  hide. 

b. 

Run 

(ful 

1  speed 

in 

straight  line). 

c. 

Sink 

to 

hot  tom 

and 

deploy  decoy. 

d. 

Run 

in  s 

i  zigzag 

pal 

ttern. 

e . 

Run 

and 

dep  1  o.v 

decov . 

The 

foil 

owing  attributes  could  be  used 

Information  Gain.  This  represents  the  expected  Information  gained  by 
friend  about  the  opponent  as  a  result  of  the  action  being  considered,  litis 
is  dependent  on  the  probability  (assessed  bv  opponent )  that  friend  has 
already  detected  him.  Thus,  if  friend  already  has  a  lot  of  information 
there's  not  much  information  left  to  be  gained. 

Deception.  This  is  the  expected  amount  of  false  information  gained  as  a 
result  of  decoying.  This  may  be  situation  dependent.  In  the  example, 
releasing  a  decoy  would  have  greater  deception  value  if  the  sub  is  resting 
on  the  bottom,  than  if  it  is  going  full  speed  ahead.  Also,  if  you  haven't 
yet  been  detected,  deploying  a  decoy  will  give  away  the  fact  that  you  are 
in  the  area. 

Vulnerability.  This  attribute  represents  your  vulnerability  to  being  hit 
IT  you  are  detected.  The  attribute  levels  for  vulnerability  should  be 
subjectively  estimated  and  defined  in  advance  for  each  alternative. 
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Cost,  This  is  the  direct  cost  of  the  alternative.  Cost  may  also  be  used 
as  a  gross  resource  depletion  attribute. 

Effect  on  Mission  Objective.  This  attribute  should  be  redefined  with 
subjective  weights. 

Table  2  gives  an  example  of  the  MAU  approach  to  optimal  action 
selection.  Each  column  represents  one  alternative  action  that  the  CO  can 
choose  in  the  tactical  situation.  Each  row  represents  one  of  the  key 
attributes  by  which  each  action  is  evaluated.  The  values  in  the  table 
are  predetermined  or  calculated  from  the  features  of  the  tactical  situation. 
In  the  example  given,  the  tactical  situation  is  the  following: 

a.  The  opponent  is  80  percent  sure  friend  has  not  detected  him  (thus 
the  "run"  alternatives  may  cause  high  information  gain). 

b.  The  deception  effect  of  a  decoy  is  higher  if  chosen  with  a  "sink" 
rather  than  "run"  alternative. 

c.  Cost  includes  fuel,  weapons  and  decoy  expenses,  and  are  .1,  .4, 

.6,  .5  and  .8,  respectively. 

The  first  column  gives  the  utility  value  associated  with  each  attri¬ 
bute  by  a  trainer.  These  values  are  the  result  of  the  adaptive  training 
algorithm.  They  are  positive  for  good  attributes  (for  the  opponent 
objectives)  and  negative  for  bad  ones.  The  MAU  processor  will  select  the 
alternative  action  that  would  have  the  highest  combined  value.  This  is 
done  by  a  weighted  sum  of  utilities  times  the  attribute  level.  These 
values  are  calculated  and  rank  ordered  at  the  bottom  of  the  table.  Alter¬ 
native  #1  turns  out  to  have  the  highest  value  and  it  is  the  one  the  system 
will  select.  In  a  different  tactical  situation  the  attribute  levels  of  the 
various  options  may  be  different  (e.g.,  friend  has  detected  the  opponent) 
causing  another  action  option  to  come  up  on  top  and  that  action  would  be 
the  one  the  opponent  model  would  select  to  activate  the  simulated  opponent 
on  the  screen. 

THE  HEURISTIC  SEARCH  APPROACH 

STATE  SPACE  MODEL.  The  overall  objective  of  knowledgeable  opponent 
scenario  generation  is  to  provide  a  realistic  simulation  of  an  active  enemy. 
The  enemy  would  react  to  events  and  actions  taken  by  the  friendly  forces 
and  choose  a  course  of  action  that  would  lead  to  the  achievement  of  some 
enemy  goal,  which  usually  means  a  bad  outcome  for  the  friendly  forces.  The 
heuristic  search  approach  provides  such  a  mechanism. 

In  the  underlying  model,  which  is  called  the  "state  space"  model,  the 
problem  domain  (such  as  underwater  warfare)  is  expressed  in  terms  of 
"states,"  which  are  complete  descriptions  of  the  tactical  situations  as 
they  exist  at  some  particular  instant  of  time  (Nilsson,  1971).  An  "action" 
is  a  transformation  which,  when  applicable,  converts  one  state  into 
another.  Thus,  a  sequence  of  actions  ("plan"  or  "allocation")  converts 
some  initial  state  into  a  final,  or  goal,  state.  The  enemy  submarine 
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TABLE  2.  ATTRIBUTE  LEVELS,  VALUES,  AND  EXPECTED  VALUES 
FOR  EXAMPLE  SCENARIO 


Attribute 

Utility 

Sink  to 
bottom  & 
hide 

Run 

Sink  and 

deploy 

decoy 

Run  in 
zig  zag 
pattern 

Run  and 

deploy 

decoy 

Information 

Gain 

-1.0 

0.0 

0.7 

0.5 

0.8 

1.0 

Deception 

+0.5 

0.3 

0.0 

0.8 

0.0 

0.5 

Vulnerability 

-0.8 

1.0 

0.5 

1.0 

0.2 

0.2 

Cost 

-0.2 

0.0 

0.5 

0.9 

0.6 

1.0 

Effect  on 

Mission 

Objective 

+0.2 

-0.9 

1.0 

-1.0 

0.7 

0.6 

MAU  Value 
of  Choice 

0.0 

-0.47 

-1.16 

-0.98 

-1.18 

-0.77 

Rank  Order 

1 

4 

3 

5 

2 

Best 

worst 
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commander  asks  the  questions,  "What  sequence  of  actions  can  transform  the 
current  state  into  a  goal  state  which  satisfies  my  overall  objectives?" 

In  other  words,  "How  do  I  get  from  where  I  am  to  where  I  want  to  go?" 

Before  a  system  can  perform  properly,  it  must  know  what  actions  are 
available,  under  what  circumstances  they  can  be  applied,  what  their  effects 
are,  and  what  possible  states  can  arise  from  their  use. 

BASIC  SEARCH  TECHNIQUES.  The  most  basic  search  techniques  are  systematic 
expansions  of  the  state  space.  Starting  from  the  start  node  (labeled  1  in 
Figure  10--the  current  state),  the  search  algorithm  expands  all  its 
possible  successive  nodes.  When  a  goal  node  is  encountered,  the  path  from 
the  initial  node  to  that  goal  node  is  the  solution  sought.  In  the  ASW 
case,  it  is  the  strategy,  or  sequence  of  actions,  the  commander  has  to 
take  to  reach  his  objective. 

Figure  10  shows  the  most  elementary  algorithms--the  "breadth-first" 
and  the  "depth-first"  algorithms,  respectively.  In  the  "breadth-first" 
algorithm,  each  node  is  expanded  completely--all  its  "sons"  identified-- 
before  the  next  is  started.  This  method  is  guaranteed  to  find  the  shortest 
path  from  the  start  to  the  goal  nodes.  The  numbers  in  Figure  10  indicate 
the  order  of  node  expansion. 

In  the  "depth-first"  algorithm,  each  alternative  line  of  inquiry  is 
sought  to  the  fullest  depth  before  other  alternatives  are  evaluated.  When 
such  a  search  fails,  the  algorithm  tries  the  next  deepest  possibility. 

Figure  10  also  shows  the  order  of  node  expansion  in  this  algorithm.  The 
depth  first  algorithm  does  not  guarantee  the  shortest  path  to  a  goal  if  more 
than  one  goal  node  exists. 

These  search  methods  are  "blind"  methods  because  they  develop  systema¬ 
tically  every  node  in  the  state  space  without  using  any  information  which 
may  be  known  in  advance  about  the  particular  problem  domain  or  the  parti¬ 
cular  knowledge  found  in  the  nodes  that  has  already  been  expanded  to  guide 
the  search  process.  The  heuristic  search  approach  is  the  class  of  algorithms 
that  uses  such  domain  specific  knowledge  to  guide  the  search. 

HEURISTIC  SEARCH  METHODS.  Heuristic  search  methods  try  to  utilize  any 
information  known  about  the  problem  domain  to  guide  the  search  for  a  solu¬ 
tion  in  the  state  space.  The  added  information  helps  avoid  the  combinatorial 
explosion  of  computer  resources  (time  and  memory)  needed  for  the  basic 
search  techniques.  Figure  11  illustrates  the  basic  idea  of  the  heuristic 
search  approach  by  comparing  it  to  depth  first  and  breadth  first  searches. 

The  contours  of  node  expansion  are  directed  toward  the  goals  G1  and  G2,  in 
contrast  to  the  blind  search  algorithm.  Applying  a  heuristic  search  usually 
leads  to  the  discovery  of  optimal  or  suboptimal  solutions  in  cases  that 
would  be  too  big  to  handle  by  standard  techniques.  Many  achievements  of 
heuristic  search  are  known.  For  example, 

a.  Computer  Aided  Design  (Powers,  1973;  Hagendorf  et  al,  1975). 

b.  Test  Sequence  Generation  for  Detection  of  Failures  in  Clockmode 
Sequential  Circuits  (Hill  and  Huey),  1977. 
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SEARCH  TREE  LIMIT 


G1 ,  G2  GOAL  NODES 


Figure  11.  Expansion  Contours  of  Depth-First  Breadth-First 
and  Heuristic  Search  Methods 
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c.  Edge  and  Contour  Detection  (Martelli,  1976). 

d.  Chromosome  Matching  (Montanari ,  1970). 

e.  Organic  Chemical  Synthesis  (Sridharan,  1973). 

f.  Ballistic  Missile  Defense  (Leal.  1977). 

g.  Discovery  of  Mathematical  Concepts  (Lenat,  1978). 

The  heuristic  information  can  be  contained  in  different  parts  of  the 
search  algorithm.  If  r  is  the  function  that  generates  node  successors  and 
f  (n)  is  an  estimate  of  the  promise  of  node  n  to  be  on  the  path  to  a  goal 
node,  then  the  heuristic  information  may  be  contained  in  either  of  them. 
Using  knowledge  in  r,  the  search  algorithm  would  generate  first  the  more 
probable  successors  of  a  node.  On  the  other  hand,  using  knowledge  in  f  (n) 
the  most  promising  nodes  would  be  selected  for  subsequent  development 
in  the  face  of  less  promising  ones. 

THE  MINIMAX  AND  *1'  ALGORITHMS.  Two  algorithms  which  have  particular  appli¬ 
cability  to  the  case  of  military  confrontation  are  the  minimax  and  the 
algorithms.  The  minimax  is  applicable  in  zero-sum  adversary  confrontations 
where  what  is  good  for  one  side  is  bad  for  the  other.  When  developing  the 
state  space  of  such  a  problem,  the  prudent  decision  maker  has  to  assume 
that,  when  given  the  choice,  the  enemy  would  select  the  alternative  which 
is  the  most  damaging  to  the  decision  maker"s  own  objectives.  When  expanding 
the  search  space  for  this  problem,  as  shown  in  Figure  12,  the  commander 
first  determines  all  the  alternatives  available  to  him.  This  is  the  maxi¬ 
mizing  level  because  at  this  level  the  commander  has  the  choice,  and  he 
will  obviously  choose  the  alternative  that  maximizes  his  measure  of  success. 
The  next  level  is  the  set  of  responses  available  to  the  enemy  for  each  of 
the  commander's  choices.  Here  the  enemy  will  make  the  choice,  and  he  will 
choose  the  worst  alternative  (from  the  commander's  point  of  vie’.  ).  Thus, 
this  layer  is  called  the  minimizing  level.  The  maximizing  and  minimizing 
of  layers  continues  downward  in  the  tree  until  the  allocated  computing 
resources  are  used  up.  At  that  point,  the  static  value  of  each  tip  node 
is  evaluated.  The  value  of  a  tip  node  is  a  measure  of  how  "good"  is  the 
state  represented  by  the  node  from  the  commander  point  of  view.  If  the 
layer  of  nodes  just  above  the  tip  nodes  is  a  "maximizing"  layer,  each  node 
In  it  assumes  the  maximal  value  of  its  "children"  nodes  (and  vice  versa 
for  a  minimizing  layer).  These  "backed-up"  values  propagate  upward  in  the 
state  space  tree  until  they  reach  the  top  layer.  The  minimaxed  values 
that  reached  the  layer  just  under  the  current  state  (the  root  of  the  tree) 
are  the  basis  of  the  commander's  choice  among  the  alternative  actions 
available  to  him.  This  "mlnimaxing"  algorithm  is  repeated  for  every 
decision  the  simulated  commander  has  to  make;  thus,  it  takes  into  account 
the  dynamics  of  the  situation,  and  it  finds  the  best  tactical  move  fore¬ 
seeing  the  best  choice  of  the  enemy.  In  this  algorithm,  the  heuristic 
Information  is  contained  In  the  tip  node  evaluation  function  f  (n)  in 
the  previous  section. 
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The  alpha-beta  algorithm  is  an  improved  version  of  the  basic  minimax 
algorithm.  The  alpha-beta  algorithm  is  a  systematic  method  to  reduce  the 
number  of  nodes  that  have  to  be  evaluated  and  even  makes  it  unnecessary  to 
expand  complete  branches  of  the  state  space  tree.  It  can  be  shown  that 
although  the  algorithm  allows  a  large  part  of  the  search  tree  to  be  com¬ 
pletely  ignored,  it  will  not  lose  any  solution  that  the  basic  minimax 
algorithm  would  find. 

The  alpha-beta  algorithm  starts  with  a  depth-first  expansion  of  the 
tree  down  to  some  level  n  (see  Figure  13).  When  the  depth  limit  is  reached, 
the  tip  nodes  are  evaluated  and  temporary  values  are  backed-up  in  the  tree. 
The  alpha-beta  technique  takes  advantage  of  these  preliminary  values. 
Consider,  in  Figure  13,  the  maximizing  node  A  in  the  tree  after  nodes  4-9 
have  been  developed  below  it.  A  has  been  assigned  a  temporary  value  of 
0.2  (propagated  from  node  5).  B,  which  is  a  minimizing  node,  has  been 
assigned  a  temporary  value  of  0.1  (propagated  from  node  9). 

At  this  time,  there  is  no  point  developing  any  other  successor  to  the 
node  B  (such  as  C)  because,  since  it  is  a  minimizing  node,  the  best  value 
B  can  get  is  0.1  or  lower,  and  node  A,  being  a  maximizing  node,  will  always 
select  0.2  over  0.1.  This  argument  is  the  “alpha"  half  of  the  alpha-beta 
pruning.  The  empty  nodes  in  Figure  13  show  all  the  subtrees  that  will  be 
pruned  off  and  the  order  of  node  generation.  In  fact,  the  empty  nodes  need 
not  be  generated  at  all. 

The  "beta"  half  operates  in  precisely  the  reverse  for  nodes  in  the 
minimum  layers.  By  using  the  alpha-beta  algorithm,  the  tree  can  be  explored 
approximately  twice  as  deep  as  a  simple  minimax  algorithm,  while  expanding 
the  same  number  of  nodes.  The  algorithm  is  somewhat  slower,  inasmuch  as 
it  has  to  do  the  bookkeeping  for  the  temporary  alpha  and  beta  values.  The 
alpha-beta  algorithm  is  a  very  promising  potential  opponent  model. 

ADVANTAGES 

a.  Heuristic  search  techniques  have  a  wide  range  applicability,  as 
can  be  seen  from  the  examples  mentioned  above, 

b.  The  underlying  structure  (state-space,  AND/OR  graphs)  is  very 
general  and  fits  naturally  all  problems  of  a  combinational  nature  and  all 
hierarchical  problems  which  can  be  decomposed  into  goals  and  subgoals 
(this  includes  decision  trees). 

c.  General  theoretical  results  are  available. 

d.  It  is  universally  accepted  that  heuristics  are  crucial  to  cope 
with  intractable  problems. 

SCOPE  AND  LIMITATIONS 

a.  Heuristic  search  techniques  are  designed  for  problems  of  a  parti¬ 
cular  nature  only,  with  well-defined  states,  subgoals  or  subproblems. 
Problems  with  a  continuous  nature,  for  instance  planning  in  a  continuum, 
cannot  be  solved  via  heuristic  search. 
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b.  The  use  of  heuristic  search  itself  poses  a  problem.  The  more 
specific  a  heuristic  function,  the  more  efficient  it  is  in  guiding  the 
search.  How  well  designed  and  problem-specific  heuristics  are  will  there¬ 
fore  determine  their  efficiency. 

c.  Heuristic  search  might  be  subject  to  catastrophes  (if  no  solution 
is  found  after  the  computational  resources  are  exhausted  or  an  insuffi¬ 
ciently  good  solution  is  found). 

PRODUCTION  RULES  APPROACH 

OVERVIEW.  Production  rule  systems  represent  another  successful  approach 
for  knowledge  representation  and  deductive  mechanisms.  This  approach  is 
similar  to  the  heuristic  search  approach  in  that  it  uses  a  modification  of 
the  state  space  model  as  the  underlying  conceptualization  (see  definition 
in  the  section  on  heuristic  search).  The  technique  of  representing  the 
knowledge  is  different,  however,  and  so  is  the  mechanism  which  finds  the 
path  from  the  current  state  to  the  goal  state.  The  problem  specific  know¬ 
ledge  (heuristics)  is  packaged  in  production-rule  systems  as  small  modular 
"chunks"  called  productions. 

A  production  is  a  rule  which  consists  of  a  si tuat ion- recognition 
part  and  an  action  part.  Thus  a  production  is  a  "situation--action"  pair 
in  which  the  left  side  is  a  list  of  things  to  watch  for  in  the  description 
of  the  current  state  of  the  world,  and  the  right  side  is  the  list  of  things 
to  do  in  that  case. 

In  the  case  of  submarine  warfare,  a  production  that  guides  the 
comnander's  actions  may  be  something  like: 

If 


AND 

Enemy  dominates  area 
Enemy  has  not  yet  detected  you 
You  are  out  of  his  torpedo  range 
You  are  in  very  shallow  water 


Then 


Escape  by  sinking  to  bottom  in  silence 

The  effect  of  such  a  production  is  to  respond  to  the  situation  when 
all  the  aspects  combined  by  the  AND  are  present  and  change  the  current 
action  from  whatever  it  was  before  to  ESCAPE. 

In  addition  to  the  large  set  of  such  productions,  the  production 
rule  system  contains  a  triggering  mechanism  that  uniformly  checks  all  the 
productions  that  apply  in  a  given  situation  (by  testing  for  truth  of  the 
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left  hand  side  of  each  production)  and  applies  those  that  are  applicable-- 
causing  the  situation  to  change. 

The  main  advantages  of  the  production  rule  approach  are  the  ease  and 
modularity  of  the  knowledge  representation.  Consequently,  it  is  easy  to 
elicit  information  from  experts  without  requiring  that  they  be  programmers. 

In  fact,  many  training  manuals  are  written  already  in  "production  rule  style. 
Furthermore,  the  information  is  incremental;  thus  it  is  easily  modified, 
updated  and  expanded  into  new  areas  of  expertise.  It  is  also  usually 
argued  by  production  rule  proponents  that  this  form  of  knowledge  representa¬ 
tion  is  highly  compatible  with  human  cognition,  making  it  a  very  useful  and 
powerful  training  tool.  For  example,  suppose  an  opponent  commander  model  is 
built  as  a  production  rule  system.  It  becomes  very  easy  to  communicate 
with  the  system  and  ask  "Why  have  you  done  that?"  meaning  what  aspects  of 
the  situation  or  what  actions  of  the  trainee  caused  some  unexpected  response 
of  the  simulated  enemy  commander. 

The  trainee  can  discover  specifically  where  he  went  wrong,  and  he  can 
start  in  mid  action  and  try  other  alternatives.  At  the  same  time,  this  is 
also  a  powerful  debugging  tool  allowing  experts  to  tune  the  system  by 
following  its  reasoning  process  and  identifying  the  specific  cause  for  a 
mistaken  conclusion  which  led  to  an  unreasonable  response. 

THE  PRODUCTIONS.  As  AND/OR  graphs  (a  graph  with  nodes  combined  by  logical 
AND  or  OR  functions),  production  systems  are  composed  of  two  parts:  the 
set  of  productions  and  a  mechanism  to  find  a  solution  in  a  given  situation. 

We  will  discuss  first  a  graphic  representation  of  the  productions  them¬ 
selves.  A  simple  production  specifies  a  single  conclusion  which  follows 
from  the  simultaneous  satisfaction  of  the  situation  recognition  conditions. 
Any  particular  conclusion  may  spring  from  any  production.  The  conclusion 
specified  in  a  production  follows  from  the  AND  or  "conjunction"  of  the  facts 
specified  in  the  premise  recognition  part.  A  conclusion  reached  by  more 
than  one  production  is  said  to  be  the  OR  or  "disjunction"  of  those  pro¬ 
ductions.  Depicting  these  relationships  graphically  produces  an  AND/OR 
graph.  Figure  14  shows  an  AND/OR  graph  which  reaches  from  base  tactical 
facts  (F-)  on  the  left,  through  the  different  productions  (P:l,  to  a  con¬ 
clusion  or  an  act  to  be  taken,  on  the  right  side  of  the  figure.  Any 
collection  of  productions  implies  such  a  graph.  In  Figure  14  we  used  the 
set  of  submarine  warfare  productions  given  in  Figure  lb.  These  productions 
should  be  taken  as  an  example  of  the  capabilities  of  this  approach. 

The  arrangement  of  nodes  in  this  graph  focuses  on  how  the  conclusion 
can  be  reached  by  various  combinations  of  basic  facts.  As  with  ordinary 
AND/OR  trees,  a  conclusion  is  verified  if  it  is  possible  to  connect  it  with 
basic  facts  through  a  set  of  satisfied  AND/OR  nodes.  Different  sets  of 
facts  can  be  used  to  reach  a  given  conclusion  by  selecting  different 
branches  at  OR  nodes. 

Sometimes  it  is  useful  to  look  at  the  implied  graph  to  get  a  better 
feel  for  the  problem  space,  noting  whether  the  reasoning  is  likely  to  be 
broad  and  shallow,  narrow  and  deep,  or  broad  and  deep.  Again,  however, 
caution  is  in  order.  When  used  prominently  in  discussions  of  goals  and 
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pi 

IF 

OR 

location  nr- Jr  enemy  hurt- 
J  or  iVKirr  enemy  ships  1  n  1  he  Jrr J 
Anti  sub  ship  in  area 
Nuclear  enemy  sub 
THEN 

ENEMY  OOMINATES  SCENE 
P>L  PL 
IF 

AND 

SELF  WITHIN  SENSOR  RANGE 
ENEMY  CHANGED  COURSE 
THEN 

SELF  DETECTED 

ELSE 

NOT  DETECTED 
P4 
IF 

AND 

SELF  on  passive  mission 
SELF  not  detected 
Enemy  dominates  Area 
THEN 
ESCAPE 
PS 
IF 
AND 
ESCAPE 

SELF  in  deep  water 
THEN 

Sink  deep  and  run 


P» 

IF 

AND 
t  SC  API 

Sell  in  sha 1 1 ow  water 
I  HEN 

sink  to  bottom  in  silence 
P’ 

II 

AND 

ESCAPE 

stlF  in  Islamite  area 
THEN 

Hide  ben md  and  island 
I'D ,  p« 

IF 

OR 

AND 

One  enemy  sub  in  area 
Sell  in  deep  water 
Enemy  sub  01  same  type 
AND 

Enemy  surface  ship  alone 
NO  Asw  in  air 

THEN 

sELf  dominate 


Figure  15.  Production  Rule  Example 
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subgoals,  and/or  graph  representations  tend  to  make  control  look  like  a 
search  problem  with  the  various  search  ideas  becoming  applicable.  This 
position  has  its  good  and  bad  features.  One  bad  feature  is  that  it  can 
create  a  tendency  to  waste  time  with  an  existing  problem  space  rather  than 
to  make  a  better  space,  where  less  search,  if  any,  would  be  needed. 

THE  CONTROL  MECHANISM.  The  control  mechanism  which  utilizes  the  set  of 
productions  takes  a  collection  of  known  facts  about  the  situation  and  makes 
new  conclusions  according  to  productions  that  are  satisfied  by  the  initial 
facts.  In  operation,  the  user  would  first  gather  up  all  facts  available 
and  present  them  to  the  system.  The  control  mechanism  will  then  scan  the 
production  list  for  a  production  which  has  a  matching  situation  part,  i.e., 
all  the  premises  in  the  left  hand  side  are  satisfied.  This  production  will 
be  activated  and  its  action  side  will  change  the  facts  known  about  the 
situation.  In  the  example  given,  if  PI  was  activated,  it  adds  the  con¬ 
clusion  that  the  "eneigy  dominates  the  area"  to  the  situation  description. 

Reasoning  from  base  facts  to  a  conclusion  rarely  entails  using  only 
a  single  step,  however.  More  often,  intermediate  facts  are  generated  and 
used,  making  the  reasoning  process  more  complicated  and  powerful.  One 
consequence  is  that  the  individual  productions  involved  can  be  small, 
easily  understood,  easily  used,  and  easily  created.  Also  notice  that  the 
intermediate  facts  added  by  the  lower  level  productions  are  tactical  facts 
meaningful  to  the  military  users  of  the  system,  resulting  in  many  benefits. 
Using  this  approach,  a  simulated  submarine  commander  can  produce  a  chain 
of  conclusions  leading  to  intelligent  tactical  actions,  even  as  a  trainee 
conwander  makes  his  actions  dynamically. 

In  the  event  many  productions  have  premises  or  situation  specifica¬ 
tions  that  are  satisfied  simultaneously,  there  must  be  some  way  of  deciding 
among  them.  Here  are  some  of  the  popular  methods: 

a.  All  productions  are  arranged  in  one  long  list.  The  first  matching 
production  is  the  one  used.  The  others  are  ignored, 

b.  The  matching  production  with  the  toughest  requirements  is  the 
one  used,  where  "toughest”  means  the  longest  list  of  constraining  premises 
or  situation  elements. 

c.  The  matching  production  most  recently  used  is  used  again. 

d.  Some  aspects  of  the  total  situation  are  considered  more  important. 
Productions  matching  high  priority  situation  elements  are  privileged. 

So  far,  the  deduction  oriented  production  system  is  assumed  to  work 
from  known  facts  to  new,  deduced  facts.  Running  this  way.  a  system  is 
said  to  exhibit  "forward  chaining. "  but  "backward  chaining"  is  also 
possible,  for  the  production  system  user  can  hypothesize  a  conclusion  or 
a  desired  final  state  and  use  the  productions  to  work  backward  toward  an 
enumeration  of  the  facts  that  would  support  the  hypothesis,  for  example, 
(see  Figure  14)  in  the  case  of  a  submarine  commander,  the  system  can  start 
from  the  mission,  e.g.,  attack  enen^y  sub.  Then  chaining  backward  from 


NAVTRAEQUIPCEN  78-C-0107-1 


(P10),  it  will  conclude  that  it  has  to  achieve  self -dominance.  This  can 
be  achieved  by  confronting  an  enemy  surface  ship  (P9)  or  an  enemy  sub  of 
the  same  type  in  deep  water  (P8).  Thus,  by  a  small  change  of  orientation, 
the  same  set  of  productions  was  used  backwards.  Knowing  that  a  deduction- 
oriented  production  system  can  run  forward  or  backward,  which  is  better? 

The  question  is  decided  by  the  purpose  of  the  reasoning  and  by  the  shape 
of  the  problem  space.  Certainly,  if  the  goal  is  to  discover  all  that  can 
be  deduced  from  a  given  set  of  facts,  then  the  production  system  must  run 
forward.  The  production  system  can  run  forward  from  all  premise  elements 
as  long  as  suitable  productions  exist.  Using  sensory  systems  to  supply  more 
facts  is  necessary  only  when  no  productions  apply,  and  no  conclusion  has 
been  reached.  On  the  other  hand,  if  the  purpose  is  to  verify  or  deny  a 
particular  conclusion,  or  reach  a  desired  situation  through  a  sequence  of 
actions,  then  the  production  system  is  probably  best  run  backward  from  that 
conclusion.  Avoiding  needless  fact  accumulation  is  one  result  obtained; 
indeed,  no  irrelevant  facts  need  be  checked  at  all. 

Deciding  whether  forward  chaining  or  backward  chaining  is  better 
depends,  in  part,  on  the  shape  of  the  space.  Figure  16  illustrates  this 
by  way  of  two  symmetric  situations.  All  possible  states  are  represented 
along  with  the  operations  that  can  change  on  e  state  into  a  neighbor.  In 
the  first  situation  shown,  forward  chaining  is  better  because  there  is  a 
general  fan-in  from  the  typical  initial  states  toward  the  typical  goal 
states.  It  is  hard  to  get  into  a  dead  end.  In  the  second  situation,  the 
shape  favors  backward  chaining  since  there  is  fan  out. 

ADVANTAGES.  Proponents  of  production  rule  systems  usually  cite  one  or 
more  of  the  following  advantages: 

a.  Production  systems  provide  a  powerful  model  of  the  basic  human 
problem  solving  rqpchanisms.  This  results  in  easy  expert  el icitation,  user 
communication  at  the  comfortable  level  of  military  tactical  concepts  and 
terms,  easy  trouble-shooting,  and  good  training  capability. 

b.  System  states  are  meaningful  to  users,  debuggers,  etc.;  thus  an 
evaluation  can  be  made  on  the  tactical  level  rather  than  in  the  computer 
implementation  level. 

c.  Production  systems  enforce  a  homogeneous  representation  of  know¬ 
ledge,  effectively  separating  the  static  data  representation  from  the  uni¬ 
formly  applied  evaluation  mechanism. 

d.  The  control  mechanism  is  simple  and  explicit  on  what  to  do  next, 
is  clear  from  the  current  state  what  productions  are  available. 

e.  Production  systems  allow  incremental  growth  through  the  addition 
of  individual  productions  and  without  changes  necessary  to  any  others. 

f.  Production  systems  allow  unplanned  but  useful,  interactions 
which  are  not  possible  with  control  structures  in  which  all  procedural 
interactions  are  determined  beforehand.  A  piece  of  knowledge,  or  a  com¬ 
bination  of  such,  can  be  applied  whenever  appropriate,  not  just  whenever  a 
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programmer  predicts  it  can  be  appropriate.  This  can  lead  to  highly  intel¬ 
ligent  performance  by  systems  with  a  surprisingly  small  (several  hundreds) 
set  of  productions. 

g.  Providing  explanation  capability  to  the  system  is  natural  to 
implement.  When  some  decision  is  made,  the  system  can  present  the  sequence 
of  productions  that  led  to  that  conclusion,  thus  affording  its  "reasoning" 
about  the  situation. 


h.  The  production  rule  approach  is  as  general  as  any  other  method 
based  on  the  state  space  model . 

i.  Productions  can  be  quantified  with  probability  information  leading 
to  applicability  in  decision  making  and  risk  evaluation. 

DISADVANTAGES.  Some  of  the  advantages  of  the  production  rule  approach  can 
become  disadvantages  if  care  is  not  exercized  in  the  design  process: 

a.  Maintaining  focus  of  attention:  It  would  seem  that  PR  systems 
allow  knowledge  to  be  tossed  into  the  system  homogeneously  and  incrementally 
without  worry  about  relating  new  knowledge  quanta  to  old.  Thus,  by  relin¬ 
quishing  control,  such  system  allow  unimportant  productions  to  usurp 
center  stage  from  more  important  productions,  leading  the  process  astray. 

b.  Size  problems:  One  particular  problem  is  that  production  systems 
may  break  down  in  the  amount  of  knowledge  is  too  large,  or  when  the  number 
of  productions  grows  beyond  reasonable  bounds.  The  advantage  of  not  needing 
to  worry  about  the  interactions  among  the  productions  can  become  the  dis¬ 
advantage  of  not  being  able  to  influence  the  interactions  among  the  larger 
number  of  productions. 

The  possible  solution,  of  course,  is  to  partition  the  facts  and 
the  productions  into  subsystems  such  that  at  any  time  only  a  manageable 
number  are  under  consideration.  Within  each  subsystem,  some  productions 
may  be  devoted  to  arranging  transfer  of  information  or  attention  to  another 
subsystem.  Curiously,  some  users  of  Hewitt's  ACTORS  language  produce  pro¬ 
grams  that  have  a  strong  resemblance  to  systems  of  communi eating  produc¬ 
tion  subsystems. 


This  solution,  however,  goes  against  one  of  the  main  advantages 
of  production  rule  systems,  namely,  modularity  and  independent  control. 
If  control  guiding  productions  are  added,  we  again  have  the  problem  of 
explicitly  directing  where  control  should  go. 


c.  Global  effects:  It  is  awkward  to  represent  global  effects  using 
PR  approach.  Here,  again,  the  modularity  of  the  productions  requires 
that  if  some  global  effects  (such  as  weather  in  ASW)  take  part  in  many 
productions,  it  is  necessary  to  duplicate  the  whole  set  of  productions 
which  behave  differently  for  each  different  weather  state. 
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SECTION  V 
MODEL  EVALUATION 


EVALUATION  ATTRIBUTES 

The  attributes  for  evaluating  different  opponent  models  are  described 
below.  These  attributes  are  divided  into  three  categories: 

a.  Modeling  Attributes. 

b.  Development  Attributes. 

c.  Performance  Attributes. 

MODELING  ATTRIBUTES 

a.  Flexibility  for  Modeling  Different  Opponents.  How  easy  it  is  to 
change  the  opponent's  appearance  of  tactical  behavior  such  as  smart/dumb, 
aggressive/defensive,  cautious/ risky,  type  of  simulated  sub,  and  mission 
type. 


b.  Ability  to  Model  Subjective  Operator  Decision  Criteria.  How  well 
the  model  deals  with  subjectivity.  Can  the  model  make  use  of  the  oper¬ 
ator's  internal  preference— value  structure? 

c.  Modeling  Continuous  Behavior.  Continuous  behavior  means  that 
the  parameters  representing  the  behavior  (sub  x,  y  location)  can  vary  in 
infinitesimal  increments  rather  than  between  a  few  discrete  alternatives. 

d.  Modeling  the  Flow  of  Control.  (Representing  in  a  flexible  man¬ 
ner  the  sequence  of  processing.)  Processing  may  be  a  decision-selecting 
among  alternatives  or  assessing  a  situation,  or  it  may  be  an  action.  The 
flow  of  control  may  further  be  parallel  or  sequential,  instantaneous  or 
protracted,  synchronous  or  a  synchronous,  and  event  driven  versus  schedule 
driven. 


e.  Modeling  AND  and  OR  Conditions.  Can  the  model  represent  compli¬ 
cated,  logically  structured  criteria  (i.e. ,  a  set  of  conditions  linked 

by  AND's  and  OR's)  for  making  a  decision. 

f.  Modeling  Probabilities.  The  capability  of  the  model  to  respond 
to  probabilistic  inputs  and  to  give  probabilistic  outputs  (or  make  a 
Monte  Carlo  selection  of  outputs). 

g.  Conciseness  of  Representation.  The  quantity  of  parameters,  data, 
or  code  needed  to  represent  a  particular  behavior. 

h.  Adaptiveness.  Can  the  model  modify  automatically  its  own  para¬ 
meters  in  response  to  external  events.  The  training  is  done  on-line,  in 
task  and  in  real  time. 
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i.  Dependencies  Among  Input  Variables.  The  difficulty  of  applying 
the  model  when  dependent  relationships  among  the  input  variables  exist. 

j.  Auxilliary  Payoffs.  This  represents  extra  features  available 
with  the  particular  modeling  approach.  Examples  are:  ability  to  explain 
decision  selections,  ability  to  output  relative  desirability  of  the 
alternatives,  performance  measures,  etc. 

DEVELOPMENT  ATTRIBUTES 

a.  Scenario  Set-Up  Time.  This  is  the  time  and  effort  required  to 
specify  a  new  scenario  or  enemy  behavior.  This  function  is  done  by  the 
instructor  ahead  of  the  training  session. 

b.  Required  Development  and  Implementation  Time  and  Cost.  This 
includes  the  time  spent  by  analysts,  the  amount  of  research  required,  the 
required  size  and  complexity  of  the  software,  ease  of  debugging,  computer 
resources  required,  etc. 

c.  Required  Integratipn  Time  and  Cost.  The  difficulty  of  integrat¬ 
ing  the  new  software  into  the  current  SCST  software  systems. 

d.  Vulnerability  to  Increase  in  the  Size  of  the  State  Space.  This 
represents  the  degree  to  which  development,  implementation,  and  model i ng 
difficulty  increases  with  the  size  of  the  state  space.  More  vulnerability 
means  that  the  complexity  increases  more  rapidly  than  the  increase  in 
state  space  size.  Vulnerability  carries  the  risk  of  the  problem  "blowing 
up"  or  becoming  intractable. 

PERFORMANCE  ATTRIBUTES 

a.  Instructor  Time  Needed  for  Operation.  The  amount  of  effort  and 
interaction  required  of  the  instructor  during  operation.  Hopefully,  the 
instructor's  burden  would  be  decreased  rather  than  increased. 

b.  Instructor  Control.  This  represents  problems  of  synchronizing 
the  model  to  allow  smooth  transitions  from  instructor  control  to  model 
control  and  vice  versa. 

c.  Required  Computer  Resources.  Run  time  and  memory  requirements 
during  model  operation. 

d.  Trainee  Evaluation  and  Performance  Measurement.  This  represents 
the  degree  to  which  trainee  performance  measures  are  naturally  and  readily 
available  from  the  model. 

e.  Real  World  Fidelity.  The  degree  to  which  the  model  reflects 
real  world  behavior  patterns. 
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EVALUATION  BY  MODELING  ATTRIBUTES 

a.  Flexibility  of  Modeling  Different  Opponents.  In  evaluating 
flexibility  we  are  not  considering  the  number  of  parameters  that  have 
to  be  adjusted  to  bring  about  a  particular  behavior--because  any  pre¬ 
defined  set  can  be  brought  in  from  back-up  memory  in  essentially  the 
same  speed.  Rather,  we  are  concerned  with  how  easy  it  is  to  obtain  the 
parameters  and  identify  the  parameters  that  have  to  be  replaced.  This 
related  to  the  consideration  of  how  transparent  the  representation  is 
with  respect  to  knowing  what  behavior  a  particular  parameter  creates 
and  vice  versa.  The  Adaptive  Decision  approach  is  the  easiest  in  that 
a  particular  behavior  can  be  generated  automatically  by  training  the 
system  on  samples  of  the  desired  behavior.  However,  this  approach  is 
not  transparent  unless  all  the  attributes  used  are  explicitly  meaningful 
to  the  decision  maker.  The  production  rules  approach  offers  the  great¬ 
est  transparency  and  clarity  becuase  particular  behaviors  are  generated 
in  a  few  localized  productions  and  they  are  stated  there  in  (almost) 
plain  language  rather  than  a  collection  of  numbers.  The  Elicited  Prob¬ 
ability  approach  is  non-automatic  (the  conditional  probabilities,  etc., 
have  to  be  elicited  explicitly  from  experts)  and  it  is  also  less  trans¬ 
parent  than  the  Adaptive  Decision  Analysis  approach  because  more  para¬ 
meters  are  needed  to  represent  a  given  behavior.  With  the  Heuristic 
Search  approach,  the  heuristic,  pruning  and  generating  functions  can  be 
changed,  but  the  changes  necessary  to  obtain  a  particular  behavior  are 
not  immediately  drivable  from  it.  The  rank  order  (starting  with  the  most 
flexible  and  transparent  approach)  is: 

(1)  Production  Rules. 

(2)  Adaptive  Decision  Analysis. 

(3)  Elicited  Probability. 

(4)  Heuristic  Search. 

b.  Ability  to  Model  Subjective  Decision  Criteria.  The  Adaptive 
Decision  Analysis  Model  was  developed  specifically  to  handle  subjective 
criteria  and  even  can  capture  them  automatically  through  training.  With 
the  Elicited  Probability  approach,  subjective  weights  could  be  applied 
to  the  output  but  more  research  would  have  to  be  done  to  find  a  way  to 
obtain  them  by  automatic  training.  The  Production  Rules  approach  can 
capture  subjective  decision  criteria  of  experts  by  embedding  them  in  the 
productions  themselves,  but  as  with  the  Elicited  Probability  approach 

it  takes  a  deliberate  effort.  The  Heuristic  Search  approach  cannot 
represent  subjective  criteria  directly.  The  rank  order,  starting  with 
the  approach  with  the  greatest  ability  for  modeling  subjective  criteria 
is: 


(1)  Adaptive  Decision  Analysis. 

(2)  Elicited  Probability. 
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(3)  Production  Rules. 

(4)  Heuristic  Search. 

c.  Modeling  Continuous  Behavior.  All  of  the  approaches  select 
discrete  alternatives  as  their  output;  however,  this  decision  making 
function  can  be  separated  from  the  actual  calculation  of  the  continuous 
variables.  Thus,  the  decision  model  will  select  among  several  functions 
that  will  perform  the  actual  trajectory  calculation.  Adaptive  Decision 
modeling  is  the  only  approach  which  accepts  continuous  criteria  as  an 
input.  The  Elicited  Probability  and  Adaptive  Decision  approaches  give  a 
value  associated  with  the  output  which  is  continuously  variable.  Heur¬ 
istic  Search  involves  a  traverse  through  a  tree  of  discrete  nodes.  The 
criteria  for  selecting  a  node  may  be  continuous  but  based  on  the  state 
at  the  parent  node  which  is  a  unique  node.  Production  Rules  combine 
discretely  defined  logical  statements  to  select  discrete  outcomes.  The 
ranking  of  the  four  approaches  (best  first)  for  this  attribute  are  as 
follows: 

(1)  Adaptive  Decision  Analysis. 

(2)  Elicited  Probability. 

(3)  Heuristic  Search. 

(4)  Production  Rule. 

d.  Modeling  the  Flow  of  Control.  Traditionally,  the  flow  of  control 
in  a  simulation  program  was  imbedded  in  the  control  structure  of  the 
implementation  language.  This  method  is  always  available  as  a  last  resort. 
By  including  a  network  of  states  in  the  production  rule  system  the  control 
flow  can  be  made  explicit.  This  avoids  dependency  on  hard  coded  logic  and 
makes  the  flow  of  control  flexible,  visible,  and  easy  to  modify.  In  the 
Heuristic  Search  approach  the  flow  of  control  is  rigidly  built  into  the 
state  space  and  the  evaluation  function,  making  changes  more  awkward.  The 
Elicited  Probability  approach  represents  flow  of  control  indirectly  in 
that  the  behavior  created  has  an  orderly  sequence.  The  Adaptive  Decision 
Analysis  addresses  mainly  the  actual  decision  points  and  the  flow  of 
control  has  to  be  provided  by  external  mechanisms.  In  rank  order,  start¬ 
ing  from  the  most  explicit  and  flexible  flow  of  control  is: 

(1)  Production  Rules. 

(2)  Heuristic  Search. 

(3)  Elicited  Probability. 

(4)  Adaptive  Decision  Analysis. 

e.  Modeling  AND  and  OR  Conditions.  Only  the  Production  Rules 
approach  explicitly  models  AND  and  OR  input  conditions.  In  order  to 
model  AND  and  OR  conditions  with  the  Elicited  Probability  approach,  it 
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is  necessary  to  define  an  input  state  which  is  determined  from  logical 
conditions.  Thus  the  AND's  and  OR's  tend  to  be  hard  coded  into  the 
program  which  generates  the  input  state.  This  may  complicate  the  dep¬ 
endency  problem.  The  Adaptive  Decision  Analysis  approach  has  similar 
but  more  severe  problems  in  dealing  with  AND  and  OR  conditions.  With 
heuristic  search  there  would  be  a  separate  node  for  every  possible  comb¬ 
ination  of  AND  and  OR  conditions.  One  way  to  include  AND  and  OR  condi¬ 
tions  would  be  to  use  a  Production  Rule  approach  to  select  from  the 
other  three  approaches  as  sub-models  (e.g.,  combine  approaches).  The 
rank  order  of  the  approaches  is: 

(1)  Production  Rules. 

(2)  Elicited  Probabilities  (distand  second). 

(3)  Adaptive  Decision  Analysis. 

(4)  Heuristic  Search. 

f.  Model inq  Probabi lities.  The  Elicited  Probability  approach 
generates~probabilistic  outputs  and  considers  the  probabilities  of  the 
input  states,  but  explicit  probabilities  as  input  state  variables  are 
not  modeled.  With  the  Adaptive  Decision  Analysis  approach,  explicit 
probabilities  as  inputs  can  be  handled,  but  the  outputs  are  not  prob¬ 
abilistic.  With  Production  Rules,  a  probability  may  be  associated  with 
the  output,  input  probabilities  can  be  handled  as  with  the  Elicited 
Probability  approach  described  above.  Heuristic  Search  cannot  handle 
probabilities  directly.  With  the  approaches  which  do  not  explicitly 
use  probabilistic  inputs,  it  is  still  possible  to  implicitly  represent 
probabilistic  inputs  by  expanding  states  into  sub-states  which  have  a 
probability  as  part  of  the  state  definition  or  breaking  the  probabilistic 
variables  into  several  discrete  ranges.  This  is  clumsy,  however,  because 
it  increases  the  size  of  the  state  space.  The  rank  order  of  how  well  the 
four  approaches  model  probabilities  is: 

(1)  Adaptive  Decision  Analysis. 

(2)  Elicited  Probability. 

(3)  Production  Rules. 

(4)  Heuristic  Search. 

g.  Conciseness  of  Representation.  In  a  sense  this  is  relative  to 
the  application.  Each  moaet  could  be  the  most  concise  for  modeling  a 
problem  ideally  suited  for  that  approach.  As  a  general  measure  of 
conciseness  we  can  consider  the  number  of  parameters  needed  to  represent 
behavior.  Here,  conciseness  should  not  be  confused  with  precision.  We 
assume  the  more  concise  model  has  fewer  parameters.  The  Adaptive  Decision 
Analysis  model  represents  behavior  with  only  four  to  seven  attribute 
weights,  and  it  is  necessary  to  calculate  the  same  number  of  attribute 
levels  for  each  action  alternative.  The  Elicited  Probability  approach 
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has  a  column  of  elicited  probabilities  for  each  alternative.  The  number 
of  states  considered  in  making  a  decsion.  The  Production  Rule  approach 
uses  one  or  more  logical  structures  for  each  action  alternative.  The 
truth  or  falsity  of  each  operand  must  be  evaluated.  Heuristic  Search 
has  nodes  corresponding  to  the  number  of  possible  combinations  of  input 
states.  A  Heuristic  function  and  a  pruning  function  must  also  be  evalu¬ 
ated.  The  rank  order  of  the  approaches  (most  concise  first)  are  as 
follows: 

(1)  Adaptive  Decision  Analysis. 

(2)  Elicited  Probability. 

(3)  Production  Rules. 

(4)  Heuristic  Search. 

h.  Adaptiveness.  Only  the  Adaptive  Decision  Analysis  approach  is 
adaptive  in  real  time. 

i.  Dependencies  of  Input  States,.  The  Elicited  Probability  and 
Adaptive  Decision  Analysis  approaches  both  assume  independent  input 
states.  In  both  cases  it  is  common  practice  to  assume  independence  as 
a  working  assumption  even  when  it  is  not  strictly  true.  The  methods  of 
overcoming  this  problem  are  basically  the  same  in  both  cases.  The 
Production  Rule  and  Heuristic  Search  techniques  don't  make  an  independent 
assumption  and  are  therefore  not  affected  by  this  problem.  The  rank 
order  (most  favorable  first)  of  this  attribute  is: 

(1)  Production  Rules  and  Heuristic  Search. 

(2)  Elicited  Probability  and  Adaptive  Decision  Analysis. 

j.  Auxiliary  Payoffs.  The  auxiliary  payoffs  for  each  approach 

are  as  follows: 

(1)  Production  Rules.  Ability  to  explain  reasoning  leading  to 
the  selected  action  alternatives.  Similarity  of  the  representation  to 
the  human  thought  process. 

(2)  Adaptive  Decision  Analysis.  Relative  desirability  of 
alternatives  is  available.  A  good  collection  of  performance  measures 
have  been  developed  to  go  with  this  approach. 

(3)  Elicited  Probabilities.  A  simulated  intelligence  expert 
can  readily  be  made.  ~ 

(4)  Heuristic  Search.  This  approach  most  directly  simulates 
the  process  of  ''thinking  ahead"  or  contemplating  a  sequence  of  possible 
moves  and  counter  moves. 
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The  rank  order  depends  upon  what  auxiliary  payoffs  are  appropriate 
for  the  particular  application  of  the  number  of  auxiliary  payoffs  avail¬ 
able  (largest  number  first): 

(1)  Adaptive  Decision  Modeling. 

(2)  Elicited  Probability. 

(3)  Production  Rules. 

(4)  Heuristic  Search. 

EVALUATION  BY  DEVELOPMENT  ATTRIBUTES 

a.  Scenario  Set-Up  Time.  With  the  Adaptive  Decision  Analysis 
approach,  the  instructor  would  act  out  the  desired  scenario  in  an  opera¬ 
tional  setting  and  the  behavior  would  be  learned  by  the  model.  It  may 
take  a  while  for  the  model  to  converge,  and  consistent  behavior  is 
required  for  the  model  to  train.  Compared  to  other  methods  the  time 
would  be  spent  doing  the  normal  task  rather  than  struggeling  with  concepts 
which  may  be  unnatural.  The  Elicited  Probability  approach  requires  that 
the  instructor  estimate  a  number  of  probabilities,  view  the  resultant 
behavior,  and  make  fine  tuning  changes.  The  Production  Rules  approach 
requires  the  specification  of  new  or  modified  production  relevant  to  the 
new  behavior.  The  Heuristic  Search  approach  requires  changes  to  the 
heuristic  function  and  possibly  the  node  definition.  This  may  be  very 
difficult.  The  rank  order  (starting  with  the  shortest  time)  is: 

(1)  Adaptive  Decision  Analysis. 

(2)  Production  Rules. 

(3)  Elicited  Probability. 

(4)  Heuristic  Search. 

b.  Required  Development  and  Implementation  Time  and  Cost.  This  is 
a  very  difficult  attribute  to  estimate.  Each  approach  has  aspects  which 
are  easy  and  those  which  are  hard.  The  following  rank  order  (quickest 
and  cheapest  first)  is  biased  by  previous  experience  Perceptronics  has 
had  with  these  models: 

(1)  Elicited  Probability. 

(2)  Production  Rules. 

(3)  Adaptive  Decision  Analysis. 

(4)  Heuristic  Search. 

c.  Required  Integration  Time  and  Cost.  Since  integration  difficulty 
is  dependent  on  the  amount  of  interfacing  with  the  existing  system,  and 
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the  amount  of  interfacing  is  dependent  on  inputs,  outputs,  and  data  areas 
needed  (which  are  roughly  the  same  for  all  approaches),  there  is  no  basis 
at  present  for  rating  one  approach  above  any  other. 

d.  Vulnerability  to  Increase  in  Size  of  the  State  Space.  The 
Adaptive  Decision  modeling  approach  is  the  least  vulnerable  to  increase 
in  the  size  of  the  state  space.  This  is  because  a  small  number  of 
attributes  are  used  and  their  number  does  not  increase.  The  only  effect 
an  increase  in  the  size  of  the  state  space  has  is  to  make  it  more  involved 
to  calculate  the  attribute  levels. 

The  Elicited  Probability  approach  could  also  stay  the  same  size  as 
the  state  space  size  increases;  however,  it  would  probably  be  a  practical 
necessity  to  increase  the  number  of  parameters  or  to  put  more  model  levels 
in  the  hierarchy. 

The  Production  Rules  and  Heuristic  Search  approaches  are  potentially 
extremely  vulnerable  to  increase  in  the  size  of  the  state  space.  In 
the  case  of  the  Production  Rules  approach,  the  number  of  additional 
Production  Rules  needed  is  likely  to  increase  faster  than  the  size  of  the 
state  space.  Heuristic  Search  is  the  most  vulnerable,  since  its  complexity 
increases  as  a  combinatorial  function  of  the  size  of  the  state  space. 

Here  is  the  rank  order  (best  first): 

(1)  Adaptive  Decision  Modeling. 

(2)  Elicited  Probability. 

(3)  Production  Rules. 

(4)  Heuristic  Search. 

EVALUATION  BY  PERFORMANCE  ATTRIBUTES 


a.  Instructor  Time  Needed  for  Operation.  Most  of  the  factors 
affecting  this  are  probably  independent  of  the  model  itself  except  for 
those  things  discussed  earlier  under  "Instructor  time  needed  to  set  up 
problem  scenario."  There  should  probably  be  some  interface  programs 


which  help  transfer  information  and  control  from  the  instructor  to  the 


models,  and  information  back  to  the  instructor  from  the  models. 


b.  Instructor  Control.  When  the  instructor  assumes  control  from 
the  model  and  vice  versa,  steps  must  be  taken  to  insure  smooth  transitions. 
This  means  that  all  of  the  state  variables  needed  by  the  models  must  be 
maintained.  Also,  the  state  changes  created  by  the  models  must  be  up¬ 
dated  In  the  existing  software.  Furthermore,  when  control  is  returned 
from  the  instructor  to  the  automatic  opponent,  the  specifics  of  the 
opponent  state  must  be  provided.  This  attribute  is  nearly  independent 
of  the  model  approach;  however,  in  general  there  is  greater  difficulty 
with  a  more  complicated  model. 
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c.  Required  Computer  Resources.  Computer  resources  are  a  function 
of  how  detailed  each  decision  is  modeled.  In  general,  the  rank  order 
(best  first)  is  as  follows: 

(1)  Adaptive  Decision  Analysis. 

(2)  Elicited  Probability. 

(3)  Production  Rules. 

(4)  Heuristic  Search. 

d.  Capability  for  Including  Performance  Measures  and  Evaluation.  A 
lot  of  development  has  gone  into  performance  measures  with  the  Adaptive 
Decision  Analysis  approach.  Performance  measures  haven't  been  developed 
with  the  other  approaches. 

In  the  applications  where  performance  measures  have  been  developed 
the  adaptive  model  was  used  to  model  the  trainee,  whereas,  in  the  present 
application  it  is  the  instructor  who  is  adaptively  modeled.  The  power 
of  the  performance  measures  is  derived  from  the  adaptive  model  of  the 
trainee.  The  reason  for  this  is  that  the  model  of  the  trainee  represents 
the  current  state  of  knowledge  and  skill  of  the  trainee  and  performance 
measures  are  based  on  an  analysis  of  model  parameters.  The  performance 
measures  made  possible  by  modeling  the  trainee  include  the  following: 

(1)  Decision  consistency. 

(2)  Comparision  of  trainee  values  with  expert  values. 

(3)  Use  of  the  trainee  values  to  drive  a  simulation  to  compare 
the  behavior  created  by  the  trainee's  values  to  behavior  created  by  other 
sets  of  values. 

(4)  Use  the  trainee  values  as  they  are  to  characterize  the 

trainee. 

In  addition  to  performance  measures  based  on  adaptively  modeling  the 
trainee,  the  following  measures  have  been  developed: 

(1)  Evaluate  trainee's  skill  at  purchasing  information. 

(2)  Compare  the  trainee's  decision  with  the  decision  the  expert 
would  make  (as  indicated  by  an  expert  model  with  corresponding  values). 

(3)  Measure  decision  time. 

(4)  Define  a  way  to  score  the  task  elements  such  that  a  score 
results  from  each  session  (this  measure  Is  more  powerful  when  used  with 
an  adaptive  trainee  model). 

(5)  Compile  statistics  on  the  trainee's  frequency  of  making 
various  decisions  and  compare  these  with  expert  statistics. 
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As  envisioned  previously,  the  adaptivity  is  used  to  model  the  instruct¬ 
or  acting  as  the  opponent--the  trainee  was  not  modeled.  However,  If  good 
performance  measures  are  important,  it  would  be  good  to  model  the  trainee 
as  well.  The  algorithms  to  do  this  would  be  available  in  the  software 
since  they  would  have  been  developed  to  model  the  instructor.  Much  of 
the  interfacing  to  model  the  traineee  must  also  be  done  anyway.  The  main 
complication  in  adding  the  vapability  to  also  model  the  trainee  is  the 
fact  that  to  be  valid  the  attribute  levels  should  be  displayed  to  the 
trainee.  This  changes  the  task  as  it  appears  to  the  trainee. 

e.  Real  World  Fidelity.  Each  model  has  the  highest  real  world 
fidelity  when  applied  in  an  area  most  suited  for  it. 

Table  3  summarizes  all  the  conclusions  of  this  chapter  in  table 

form. 
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TABLE  3.  MODEL  EVALUATION  BY  DIFFERENT  CRITERIA 
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c 
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4 
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TABLE  3  (CONTINUED).  MODEL  EVALUATION  BY  DIFFERENT  CRITERIA 
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SECTION  VI 


MODEL  EVALUATION  FOR  SPECIFIC  DECISIONS 


GENERAL 

In  the  preceeding  section  each  of  the  models  were  evaluated  by  a  list 
of  general  attributes.  In  this  section,  we  will  present  several  specific 
decisions  that  a  submarine  CO  has  to  perform  and  discuss  the  applicability 
of  each  model.  It  has  to  be  kept  in  mind,  however,  that  each  decision 
does  not  stand  alone  and  the  control  process  that  determines  what  has  to 
be  considered  next,  and  what  are  the  action  options  available  there,  is 
as  important  as  the  making  of  the  decision  itself. 

For  each  of  the  decisions  described  below  a  simple  description  of 
the  decision  is  given  and  then  the  various  approaches  are  rank  ordered 
according  to  their  suitability. 

CONTACT  DECISION 

This  is  a  protracted  decision  which  dramatically  influences  the  CO 
behavior.  It  has  to  be  continued  even  after  a  positive  contact  is  made 
to  maintain  the  contact  and  to  retract  the  "contact  made"  decision  if  new 
evidence  indicate  that  the  intitial  decision  was  erroneous.  Time  enters 
the  decision  in  that  the  probability  of  positive  contact  increases  if  a 
noise  is  repeated  or  is  detected  over  a  longer  period.  Additional 
considerations  are  the  level  of  background  sea  noises  at  the  given  weather, 
the  closeness  to  enemy  sea  operations,  previous  intelligence  information, 
etc. 


Some  of  these  decision  variables  are  intended  to  the  model  and  some 
are  inputs  generated  by  the  friend  or  the  sea.  The  external  signals  have 
to  be  preprocessed  and  transformed  into  a  variable  acceptable  by  the 
decision  model.  A  probabilistic  output  is  desirable.  A  recommended  rank 
order  of  the  approaches  is  the  following: 

a.  Elicited  Probability.  This  model  takes  the  available  apriorl 
probabilities  and  can  update  them  incrementally  as  new  evidence  comes 
in.  The  output  Is  compared  to  a  threshold  to  decide  whether  to  declare 
"contact"  or  not.  The  conditional  probabilities  in  the  transformation 
matrix  represent  an  opponent's  ability  to  diagnose  noises  and  aggregate 
clues.  These  probabilities  can  be  changed  to  simulate  different  opponent 
skill  levels  and  even  level  of  conservatism.  Furthermore,  a  threshold 
change  can  be  a  simple  mechanism  to  adjust  the  opponent's  conservatism. 

b.  Adaptive  Decision  Analysis.  The  input  consists  of  attributes  of 
the  noise  state  scaled  such  that  a  high  attribute  level  means  "contact." 
An  expert's  weights  for  each  attribute  are  learned.  An  expected  value 

Is  computed  which  represents  the  likelihood  of  contact.  Contact  is 
declared  when  this  value  exceeds  a  pre-set  threshold. 
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c.  Production  Rules.  The  various  considerations  suggesting  a  contact 
can  be  incorporated  into  ascending  states.  Productions  triggered  by  noise 
type  and  level  can  "vote"  to  move  the  state  to  one  of  increased  probability 
of  contact. 

d.  Heuristic  Search.  The  only  way  heuristic  would  be  appropriate 
is  if  the  order  of  different  noises  was  the  predominate  identifying 
characteristic. 

THREAT  DECISION 

The  threat  decision  is  more  an  interpretation  of  external  events  than 
a  classification  of  fixed  patterns.  It  considers  the  mission,  state  of 
war,  location  relative  to  enemy,  noises  detected  and  number  location  and 
motions  of  potential  threat.  A  simple  breakdown  of  the  different  consider¬ 
ations  follows: 


to 

Nationality 

Location 

Maneuver  Etc. 

Nothing 

Fri endly 

Near  home 

Indifferent 

Whale 

Neutral 

Open  sea 

Moving  away 

Decoy 

Unfriendly/peace 

Near  enemy 

Moving  toward 

Surface 

ship 

Unfriendly/war 

Positioning  for  attack 

Nuclear 

sub 

etc. 

a. 

Elicited  Probabilities. 

This  approach  has  the  flexibility  to 

include 

all  of  the  above  factors. 

The  apriori  probabilities  of  the  various 

output  conditions  (e.g.,  nature  of  the  threat)  can  be  biased  according  to 
the  intelligence  information  which  exists.  The  monitor's  probability 
information  is  discretized  and  made  part  of  the  Input  state. 

b.  Production  Rules.  Because  of  the  large  number  of  contributing 
factors  involved  in  this  decision  the  Production  Rules  can  be  used  to 
make  an  orderly  decision.  Each  production  handles  a  set  of  factors  which 
lead  to  a  meaningful  conclusion,  the  conclusion  can  make  other  factors 
more  relevant  and  new  productions  are  triggered,  etc.  In  general,  the 
Production  Rule  approach  is  advantageous  for  formulating  tactical  assess¬ 
ment  when  interpretive  consideration  is  dominate. 

c.  Adaptive  Peel sion  Analysis.  With  this  approach  a  discriminant 
function  Is  used  for  each  possible  Interpretation.  The  model  can  handle 
naturally  more  than  one  plausible  Interpretation  concurrently.  The 
continuous  time  effect  Is  awkward  to  represent  as  are  apriori  probabilities 
such  as  those  derived  from  Intelligence  information. 

d.  Heuristic  Search.  In  a  situation  where  it  is  necessary  to  evalu¬ 
ate  a  sequence  of  moves  and  counter  moves  In  order  to  determine  whether  a 
threat  exists  the  Heuristic  Search  approach  can  be  used.  In  this  case  a 
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threat  is  a  state  that  can  lead  to  a  set  of  terminal  nodes  which  include 
some  that  are  detrimental  to  the  opponent.  In  other  cases  where  "look 
ahead"  is  not  relevant  to  the  threat  evaluation,  the  method  would  not  be 
appropriate. 

MANEUVER  SELECTION  DECISION 

The  select  maneuver  decision  is  made  under  several  different  circum¬ 
stances  such  as  evade,  attack,  track,  approach,  etc.  Each  of  these 
circumstances  has  a  set  of  relevant  maneuvers,  one  of  which  has  to  be 
selected.  The  selecting  mechansim  can  be  similar  but  with  a  different 
set  of  parameters.  The  details  of  the  trajectory  implementing  the 
maneuver  is  performed  by  a  lower  level  subroutine  that  is  separate  from 
the  select  decision.  Such  a  subroutine  can  use  a  Monte  Carlo  method  to 
specify  the  parameters  of  the  trajectory  guided  by  the  intended  objective 
of  the  maneuver. 

a.  Adaptive  Decision  Analysis.  With  this  approach  the  relative 
des i rab i 1  Tty  o f  each  pos sible  maneuver  is  computed.  There  is  one 
discriminant  function  for  each  maneuver  and  a  set  of  attributes  across 
all  maneuvers.  This  decision  was  used  in  Section  IV  to  illustrate  the 
Adaptive  Decision  approach. 

b.  Production  Rules.  Production  Rules  are  excellent  for  imposing 
logical  criteria  on  the  maneuver  selection  decision.  Probabilities  can 
be  attached  to  the  Production  Rules,  but  this  increases  their  number. 

c.  Elicited  Probabilities.  By  interpreting  probabi 1 i ties  as 
relative  desirability  this  model  can  be  used  to  select  maneuvers.  Each 
contributing  factor  considered  increases  or  decreases  the  desirability 
of  the  candidate  maneuvers.  The  algorithm  aggregates  the  individual 
desirabilities  and  the  highest  one  is  selected.  The  particulars  of  the 
trajectory  are  then  calculated.  This  approach  is  able  to  handle  situations 
where  there  may  be  a  large  number  of  possible  maneuvers  and  many  decision 
criteria. 


d.  Heuristic  Search.  This  approach  is  not  of  use  unless  maneuver 
selection  appears  in  tTuT context  of  a  series  of  maneuvers  alternately 
selected  by  both  sides. 
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